Home

Biography

Current Projects

Research

Teaching

Publications

Call For Papers

Current Projects

red-bar.gif

Social Tagging Systems

Owing to the popularity of Web 2.0 technologies in trecent years, social tagging systems such as Del.icio.us, Flickr, Yahoo! MyWeb2.0, YouTube, Technorati, and LiveJournal have drawn considerable attention. Research activities are increasing rapidly in areas such as analysis of social tagging and investigation of the applications of social tagging in document management and information retrieval tasks.  In this project, we aims at studying the user motivations in social tagging, analyzing the semantic and structural patterns of social tags, and developing machine learning techniques to transform social tags into quality metadata to support information retrieval tasks.  

 

Health Literacy

Health literacy is the individual’s ability to obtain, process, and understand basic health information and services needed to make appropriate health care decision.  Health literacy has great impact on health care cost, quality of health care, and health outcomes.  Language and culture play important roles in health literacy.  They affect the difference in the choice of vocabularies used by health care consumers and health care professionals.  For instance, health care consumers may use the term nosebleed while health care professionals may use the term epistaxis.  Besides, health care consumers with different social backgrounds may use different vocabularies.   In this project, we aim at learning the vocabulary difference between health care consumers and professionals through the tagging they make on health care information and developing an automatic health care thesaurus to reduce the health literacy gap.

 

Social Network Data Sharing and Privacy Preservation

Social network analysis and mining draws significant attention in the recent years due to the proliferation of online communities and the advance of information sharing technologies. Social network analysis and mining techniques discover the knowledge embedded in the structure of social networks.  Such knowledge is useful in many domains such as marketing, epidemiology, and homeland security.  Sharing of social networks between organizations enables knowledge retrieval from an integrated and larger social network integrated from multiple sources.  However, the concern of privacy preservation usually prohibits the possibility of information sharing.  Recent research work on privacy preservation puts their focus on relational data but only very few work has been done on social network data.  The work on privacy preservation of social network data relies on anonymity and perturbation.  However, these techniques are developed for the purpose of data publishing but ignore the utility of the published data on social network analysis and mining.  In this proposed project, the objectives are sharing the insensitive and generalized information to support social network analysis and mining but preserving the privacy at the same time. 

 

Link, Content Analysis and Information Visualization of Weblog Communities

A Weblog is a Web site where entries are made in diary style, maintained by its sole author - a blogger, and displayed in a reverse chronological order. In this project, we develop the techniques to analyze and visualize Weblog social network. Link analysis uses the relationships between bloggers to construct the Weblog social network. Content analysis associates similar blog messages to unveil implicit relationships found in the semantics to further improve the Weblog social network analysis. Users can use different interactive information visualization techniques to explore various aspects of the underlying social network at different levels of abstraction. Based on this analysis, we shall further develop other applications such as monitoring the theme and structural evolution, active sub-group identification, and information flow classification etc.

 

Cross-lingual Information Retrieval

Cross-lingual information retrieval refers to the ability to process a query for information in one language, search a collection of objects, including text, images, audio files etc. and return the most relevant objects, translated into the users language if necessary.  Corpus based approach has been proven to overcome the shortcomings of dictionary based approach by making use of the statistical information of term usage in parallel or comparable corpora to construct an automatic thesaurus.  In this project, we investigate the automatic techniques to extract the cross-lingual concept space from the multilingual parallel corpus.  In addition, by investigating the query log files of Internet search engines, we extract the association between the queries in multiple languages. 

 

Multilingual Knowledge Management

Knowledge management applications include generating, consuming, and maintaining tremendous amount of information. An efficient and effective management of continuously increasing volume of documents is essential so that users may obtain the knowledge to accomplish their own tasks. Text categorization deals with the automatic learning of a text categorization model from a training set of pre-classified documents on the basis of their contents and the assignment of uncategorized documents to appropriate categories. Most of existing text categorization techniques deal with monolingual documents (i.e., all documents are written in one language) during the text categorization model learning and category assignment (or prediction). However, with the globalization of business environments and advances in Internet technology, an organization or individual often generates/acquires and subsequently archives documents in different languages, thus creating the need for cross-lingual text categorization. Motivated by its significance and need, this research investigates the cross-lingual text categorization technique.

 

Social Network Visualization

Analysis of social networks is essential for discovering knowledge about the structure of a community. Visualization of a network using a 2D graph can greatly facilitate the inspection of the global structure of the network. However, its usefulness becomes limited when the size and complexity of the network increase. In this project, we investigate the use of two interactive visualization techniques in the visualization of complex social networks: fisheye views and fractal views. Both techniques facilitate the exploration of complex networks by allowing a user to select one or more focus points and dynamically adjusting the graph layout to enhance the view of regions of interest. Combining the two techniques can effectively help an investigator to recognize patterns previously unreadable in the normal display due to the network complexity. 

 

Category Tree Integration

Classification of large volume of documents has been widely used by organization or information providers to organize, archive, and access documents. Document classification organizes large document collection into distinct groups of similar documents and identifies hidden themes for each group. Different information providers have different classification for the information they provide. Individual users have to be familiar with the structures of all of their category trees before they can effectively search and aggregate information from multiple information providers. On the other hand, individual users have their own classification to manage the information they collected. In this project, we propose an automatic technique to integrate the categories from the source category trees to the personalized master category tree of individual users. Such automatic technique is able to expand the personal category tree from the source category tree when necessary. Users will no longer require browsing through multiple category trees from different information providers but only their own category tree. The personal master category trees will also be improved as the aggregated information continuous to expand.

 

Sentiment Analysis

The Web provides a public channel for consumers to express their opinions on consumer products.  Sentiment analysis supports companies and individuals to exploit these sources of information to gain market intelligence.  In this project, we develop automatic techniques to extracted product features commented by consumers and determine the semantic orientation of these comments.  With these techniques, we are able to provide summary and benchmarking of consumer products. 

 

Event Evolution

In topic detection and tracking, news stories are monitored and automatic techniques are used to spot new events and track the progress of previously spotted events. However, the traditional document clustering techniques only organize the news stories into a flat hierarchical structure. Such an organization is not capable of presenting the evolution and complex relationships between the news events. Users are not able to capture how the events begin, evolve and end without a graphical evolution network. An event evolution graph is effective in presenting the rich underlying structure of events and allows efficient and meaningful information browsing. As a result, users are not only able to retrieve news stories of the same topic but also able to capture their evolution.  In this project, we represent the event evolution relationships among events in an event evolution graph, where event evolution includes event threading and event joining. We investigate the features and techniques to determine the event evolution relationship.      

 

Extracting Topic Hierarchy from Web sites

Modeling web sites content structure is useful for various web site processing tasks including navigation, classification, etc. Hierarchical models are commonly used to organize a web sites content. A web sites content structure can be represented by a topic hierarchy, a directed tree rooted at a Web sites homepage in which the vertices and edges correspond to Web pages and hyperlinks. In this work, we investigate several algorithms to extract a web sites topic hierarchy from its link structure by analyzing the semantic relationships between web pages based on link structure, web page content and directory structure.

 

Multimedia Retrieval using Hyperlink Analysis

Hyperlink analysis has been widely investigated to support the retrieval of Web documents in Internet search engines.    It has been proven that the hyperlink analysis significantly improves the relevance of the search results and these techniques have been adopted in many commercial search engines, e.g. Google.  However, hyperlink analysis is mostly utilized in the ranking mechanism of Web pages only but not including other multimedia objects, such as images and video.  In this project, we investigate several algorithms to support the searching of multimedia objects in the Web. 

 

 

 



Copyright © 2009 Christopher C. Yang. All Rights Reserved.

Last updated: March 2009