|
Project Description:
Data mining (aka Knowledge Discovery in Databases, KDD) is a procedure
to extract previously unknown and potentially useful information
or pattern from huge data sets. KDD is usually a multiphase process
involving numerous steps such as data preparation, data preprocessing,
feature selection, rule induction, knowledge evaluation and deployment
etc. Many novel data mining and learning algorithms have been
developed, though vigorously, under rather add hoc and vague concepts.
These algorithms, in most cases, are individual creations of different
researchers, without much common methodological and fundamental
framework. In other words, great majority of work in data mining
is focused on algorithm development while neglecting the studies
of fundamental theoretical issues concerning data, inter-data
relationships, and quality of the implicit information hidden
in the data or data redundancies. Thus, it is not easy to fully
understand and evaluate how individual phase influences each other
and the impact of each phase on the whole knowledge discovery
process. For further development and breakthroughs in data mining
and learning algorithms, a deep examination of its foundation
is necessary. The central goal of the proposed research is to
develop a unified rough set based data mining framework to explore
various fundamental issues of data mining and learning algorithms.
It aims to present the analytical capabilities of the methodology
of rough sets in the context of data mining methodologies, techniques
and applications. It will provide a unified framework to help
better understand the whole KDD process. Intellectual merit: Rough
set theory is particularly suited to reasoning about imprecise
or incomplete data and discovering relationships in the data.
The simplicity and mathematical clarity of rough set theory makes
it attractive for both theoreticians and application-oriented
researchers. The main advantage of rough set theory is that it
does not require any preliminary or additional information about
the data, such as probability in statistics, basic probability
assignment in Dempster-Shafer theory or the value of membership
in fuzzy set theory. Rough set theory constitutes a sound basis
for KDD and can be used in different phases of the KDD process.
In particular, the formal techniques of rough set theory lead
to many novel and promising breakthrough methods and algorithms
for attribute functional, or partial functional dependencies,
their discovery, analysis, and characterization, feature election,
feature extraction, data reduction, decision rule generation,
and pattern extraction (templates, association rules) etc., which
are the fundamental issues of the KDD process. Rough set theory
represents a new innovative approach and can lead to the development
of new learning algorithms to create novel uses and breakthroughs
of data mining techniques.
|