ECIR 2009 Tutorials:
Tutorials Venue: Manufacture des Tabacs, University of Toulouse 1
Common schedule for tutorials attendees
|On site registration||8:30|
|Coffee break||10:15 to 10:45|
|Lunch time||12:00 to 14:00|
|Coffee break||15:15 to 15:45|
|Closure Apéritif||17:00 to 18:00|
Current Developments in Information Retrieval Evaluation
(Manufacture des Tabacs, University of Toulouse 1, room MD309)
Organization: Thomas Mandl
The tutorial introduces and summarizes recent research on the validity of evaluation experiments in information retrieval
Information Extraction and Linking in a Retrieval Context
(Manufacture des Tabacs, University of Toulouse 1, room MD302)
- Marie-Francine Moens
- Djoerd Hiemstra
We witness a growing interest and capabilities of automatic content recognition (often referred to as information extraction) in various media sources that identify entities (e.g. persons, locations and products) and their semantic attributes (e.g., opinions expressed towards persons or products, relations between entities). These extraction techniques are most advanced for text sources, but they are also researched for other media, for instance for recognizing persons and objects in images or video. The extracted information enriches and adds semantic meaning to document and queries (the latter e.g., in a relevance feedback setting). In addition, content recognition techniques trigger automated linking of information across documents and even across media. This situation poses a number of opportunities and challenges for retrieval and ranking models.
For instance, instead of returning full documents, information extraction provides the means to return very focused results in the form of entities such as persons and locations. Another challenge is to integrate content recognition and content retrieval as much as possible, for instance by using the probabilistic output from the information extraction tools in the retrieval phase. These approaches are important steps towards semantic search, i.e., retrieval approaches that truly use the semantics of the data.
We propose a half day tutorial which gives an overview of current information extraction techniques for text, including among others entity recognition and entity relation recognition. Examples of content recognition in other media are given.
The tutorial goes deeper into current approaches of automated linking, including probabilistic methods that maximize the likelihood of aligning recognized content. As a result, documents can be modeled as mixtures of content, incorporating certain dependencies, and document collections can be represented as a web of information.
An important part of the tutorial focuses on retrieval models and ranking functions that use results of the information extraction. We explain the use of probabilistic models, more specifically relevance language models for entitity retrieval, graph models and probabilistic random walk models for entity retrieval, and extensions of models to handle noisy entity recognition or noisy concept recognition.
The tutorial's main goal is to give the participants a clear and detailed overview of content modeling approaches and tools, and the integration of their results into ranking functions. A small set of integrated and interactive exercises will sharpen the understanding by the audience.
By attending the tutorial, attendants will:
- Acquire an understanding of current information extraction, topic modeling and entity linking techniques;
- Acquire an understanding of ranking models in information retrieval;
- Be able to integrate the (probabilistic) content models into the ranking models;
- Be able to choose a model for retrieval that is well-suited for a particular task and to integrate the necessary content models.
The tutorial includes several motivating examples and applications among which are expert search using output from named entity tagging, connecting names to faces in videos for person search using output from named entity tagging and face detection, video search using output from concept detectors, and spoken document retrieval using speech lattices and posterior probabilities of recognized words. The examples will be combined in a larger case study: Retrieval of news broadcast video.
Mining Query Logs
(Manufacture des Tabacs, University of Toulouse 1, room MD308)
- Salvatore Orlando
- Fabrizio Silvestri
Web Search Engines (WSEs) have stored in their query logs information about users since they started to operate. This information often serves many purposes. The primary focus of this tutorial is to introduce to the discipline of query log mining. We will show its foundations, by giving a uni ed view on the literature on query log analysis, and also present in detail the basic algorithms and techniques that could be used to extract useful knowledge from this (potentially) in nite source of information. Finally, we will discuss how the extracted knowledge can be exploited to improve different quality features of a WSE system, mainly its effectiveness and efficiency.