| Despite the ubiquitous
use of the Internet for sharing and locating information, it
is difficult on the current Web to find schemas by given
semantics and requirements. However, with the rapidly
increasing demand on information integration across
distributed sources, it is desirable and very important to
(1) search and reuse (parts of) schemas that describe data
in data sources, (2) design schemas with good properties to
facilitate data integration in unforeseen situations, and
(3) maintain and share semantic mappings between
heterogeneous schemas. The purpose of the KROSS project is
to investigate novel corpus-based approaches and develop
effective and efficient tools for tracing and sharing
schemas and semantics to address the above challenges. A
schema refers to a data representation which describes
elements and relationship in a particular domain, e.g.,
relational schema. Semantics of a schema amounts to the
correspondence between the schema and the subject matter it
describes. A great deal of effort has been put into the
problem of discovering semantic mappings between schemas. It
is surprisingly rare, however, to consider the upfront and
intuitively more effective effort on sharing and reusing
schemas as well as their semantics in the process of schema
design and mapping creation.
The KROSS (Knowledge Repository Of Schemas and Semantics)
repository contains classified and indexed schemas and
mappings. We develop a set of effective and efficient tools
for utilizing the repository for schema management and
integration. Specifically, we employ techniques in database
(schema integration and mapping) and artificial intelligence
(machine learning and ontology) to attack the following
central challenges: (1) Dynamically creating and maintaining
the KROSS schema and mapping repository. (2) Searching
schemas with respect to given semantics using keywords as
well as structured queries. (3) Developing a
design-by-example schema design approach which generates a
new schema by searching and combining existing schemas. (4)
Discovering schema semantics using corpora of schemas.
The KROSS project brings many
transformational innovations to the study of schema design,
schema mapping, and data integration. It draws upon the
strengths of existing technologies and adds many novel
approaches and significant contributions. The KROSS
repository contains a number of classified and archived
representations of each concept in a specific domain. It
supports the development of integration-aware and
semantics-enriched schemas through a transformational
design-by-example schema design approach. Moreover,
aggregated symbolic and probabilistic evidence provides many
opportunities to increase the automation of schema mapping. |