Current Projects
Social Tagging Systems
Owing to the popularity of Web 2.0 technologies in trecent
years, social tagging systems such as Del.icio.us, Flickr, Yahoo! MyWeb2.0,
YouTube, Technorati, and LiveJournal have drawn considerable attention. Research
activities are increasing rapidly in areas such as analysis of social tagging
and investigation of the applications of social tagging in document management
and information retrieval tasks. In this
project, we aims at studying the user motivations in social tagging, analyzing
the semantic and structural patterns of social tags, and developing machine
learning techniques to transform social tags into quality metadata to support
information retrieval tasks.
Health Literacy
Health literacy is
the individual’s ability to obtain, process, and understand basic health
information and services needed to make appropriate health care decision. Health literacy has great impact on health
care cost, quality of health care, and health outcomes. Language and culture play important roles in
health literacy. They affect the
difference in the choice of vocabularies used by health care consumers and
health care professionals. For instance,
health care consumers may use the term nosebleed while health care professionals
may use the term epistaxis. Besides,
health care consumers with different social backgrounds may use different
vocabularies. In this project, we aim
at learning the vocabulary difference between health care consumers and
professionals through the tagging they make on health care information and
developing an automatic health care thesaurus to reduce the health literacy
gap.
Social Network Data Sharing and Privacy
Preservation
Social network analysis and
mining draws significant attention in the recent years due to the proliferation
of online communities and the advance of information sharing technologies.
Social network analysis and mining techniques discover the knowledge embedded
in the structure of social networks.
Such knowledge is useful in many domains such as marketing,
epidemiology, and homeland security.
Sharing of social networks between organizations enables knowledge
retrieval from an integrated and larger social network integrated from multiple
sources. However, the concern of privacy
preservation usually prohibits the possibility of information sharing. Recent research work on privacy preservation
puts their focus on relational data but only very few work has been done on
social network data. The work on privacy
preservation of social network data relies on anonymity and perturbation. However, these techniques are developed for the purpose of data
publishing but ignore the utility of the published data on social network
analysis and mining. In this proposed
project, the objectives are sharing the insensitive and generalized information
to support social network analysis and mining but preserving the privacy at the
same time.
Link, Content Analysis and Information
Visualization of Weblog Communities
A Weblog is a Web site where entries are
made in diary style, maintained by its sole author - a blogger, and displayed
in a reverse chronological order. In this project, we develop the techniques to
analyze and visualize Weblog social network. Link analysis uses the
relationships between bloggers to construct the Weblog social network. Content
analysis associates similar blog messages to unveil implicit relationships
found in the semantics to further improve the Weblog social network analysis.
Users can use different interactive information visualization techniques to
explore various aspects of the underlying social network at different levels of
abstraction. Based on this analysis, we shall further develop other
applications such as monitoring the theme and structural evolution, active
sub-group identification, and information flow classification etc.
Cross-lingual Information Retrieval
Cross-lingual information retrieval refers
to the ability to process a query for information in one language, search a
collection of objects, including text, images, audio files etc. and return the
most relevant objects, translated into the user’s
language if necessary. Corpus based approach has been proven to overcome
the shortcomings of dictionary based approach by making use of the statistical
information of term usage in parallel or comparable corpora to construct an
automatic thesaurus. In this project, we investigate the automatic
techniques to extract the cross-lingual concept space from the multilingual
parallel corpus. In addition, by investigating the query log files of Internet
search engines, we extract the association between the queries in multiple
languages.
Multilingual Knowledge Management
Knowledge management applications include
generating, consuming, and maintaining tremendous amount of information. An
efficient and effective management of continuously increasing volume of
documents is essential so that users may obtain the knowledge to accomplish
their own tasks. Text categorization deals with the
automatic learning of a text categorization model from a training set of
pre-classified documents on the basis of their contents and the assignment of uncategorized documents to
appropriate categories. Most of existing text
categorization techniques deal with monolingual documents (i.e., all
documents are written in one language) during the text categorization model
learning and category assignment (or prediction). However, with the
globalization of business environments and advances
in Internet technology, an organization or individual often generates/acquires
and subsequently archives documents in different languages, thus creating the
need for cross-lingual text categorization. Motivated by its
significance and need, this research investigates the cross-lingual
text categorization technique.
Social Network Visualization
Analysis of social networks is essential for
discovering knowledge about the structure of a community. Visualization of a
network using a 2D graph can greatly facilitate the inspection of the global
structure of the network. However, its usefulness becomes limited when the size
and complexity of the network increase. In this project, we investigate the use
of two interactive visualization techniques in the visualization of complex
social networks: fisheye views and fractal views. Both techniques facilitate
the exploration of complex networks by allowing a user to select one or more
focus points and dynamically adjusting the graph layout to enhance the view of
regions of interest. Combining the two techniques can effectively help an
investigator to recognize patterns previously unreadable in the normal display
due to the network complexity.
Category Tree Integration
Classification of large volume of documents
has been widely used by organization or information providers to organize,
archive, and access documents. Document classification organizes large document
collection into distinct groups of similar documents and identifies hidden
themes for each group. Different information providers have different
classification for the information they provide. Individual users have to be
familiar with the structures of all of their category trees before they can
effectively search and aggregate information from multiple information
providers. On the other hand, individual users have their own classification to
manage the information they collected. In this project, we propose an automatic
technique to integrate the categories from the source category trees to the
personalized master category tree of individual users. Such automatic technique
is able to expand the personal category tree from the source category tree when
necessary. Users will no longer require browsing through multiple category
trees from different information providers but only their own category tree.
The personal master category trees will also be improved as the aggregated
information continuous to expand.
Sentiment Analysis
The Web provides a public channel for consumers to express their opinions on consumer products. Sentiment analysis supports companies and individuals to exploit these sources of information to gain market intelligence. In this project, we develop automatic techniques to extracted product features commented by consumers and determine the semantic orientation of these comments. With these techniques, we are able to provide summary and benchmarking of consumer products.
Event Evolution
In topic detection and tracking, news
stories are monitored and automatic techniques are used to spot new events and
track the progress of previously spotted events. However, the traditional
document clustering techniques only organize the news stories into a flat
hierarchical structure. Such an organization is not capable of presenting the
evolution and complex relationships between the news events. Users are not able
to capture how the events begin, evolve and end without a graphical evolution
network. An event evolution graph is effective in presenting the rich
underlying structure of events and allows efficient and meaningful information
browsing. As a result, users are not only able to retrieve news stories of the
same topic but also able to capture their evolution. In this project, we
represent the event evolution relationships among events in an event evolution
graph, where event evolution includes event threading and event joining. We
investigate the features and techniques to determine the event evolution
relationship.
Extracting Topic Hierarchy from Web sites
Modeling web site’s content structure is useful for various
web site processing tasks including navigation, classification, etc.
Hierarchical models are commonly used to organize a web site’s content. A web site’s content structure can be represented by a
topic hierarchy, a directed tree rooted at a Web site’s homepage in which the vertices and edges
correspond to Web pages and hyperlinks. In this work, we investigate several
algorithms to extract a web site’s
topic hierarchy from its link structure by analyzing the semantic relationships
between web pages based on link structure, web page content and directory
structure.
Multimedia Retrieval using Hyperlink
Analysis
Hyperlink
analysis has been widely investigated to support the retrieval of Web documents
in Internet search engines. It has been proven that the
hyperlink analysis significantly improves the relevance of the search results
and these techniques have been adopted in many commercial search engines, e.g.
Google. However, hyperlink analysis is mostly utilized in the ranking
mechanism of Web pages only but not including other multimedia objects, such as
images and video. In this project, we investigate several algorithms to
support the searching of multimedia objects in the Web.
Copyright © 2009 Christopher C. Yang. All Rights
Reserved.
Last updated: March 2009