Elzbieta Dura on "Information retrieval and extraction from domain specific text collections: the fusion proceedings and biomedical research abstracts"
Participants at SASW 2009 can without additional fees participate in a tutorial called "Information retrieval and extraction from domain specific text collections: the Fusion Proceedings and biomedical research abstracts" given by Elzbieta Dura of Lexware Labs and the University of Skövde. The tutorial will be given on Tuesday October 13th. There a limited number people who can attend the tutorial so please register your interest by sending an E-mail to swift2009@lists.his.se.
Two themes are interleaved in this tutorial: a practical and a theoretical one. In the practical part the audience will acquire proficiency in using a powerful information retrival system Culler. The service is available online at http://bergelmir.iki.his.se/culler, and it includes access to past Fusion Conference proceedings - the Fusion Corpus, as well as to a number of biomedical text collections, such as the Gene Corpus, the Cancer Corpus and the Diabetes Corpus. The service is based on 'corpus technology' - natural language processing, in which language knowledge and statistics are fused to align text excerpts of relevant content. With sophisticated queries it is possible to extract specific and complex information from various text sources. In order to fully exploit the potential of the service, a non-initiated user needs to get introduced to its specific character. The corpus tour will take us through the available functions with the help of illustrative examples. We shall learn how to formulate queries, set parameters for extraction, interpret obtained results, access broader context, get assistance from the hidden dictionary, etc. Two main goals of these practical exercises are information retrieval and term extraction. The latter will show how the system can be used to arrive at a more stringent terminology.
The theoretical part will elucidate the NLP background in information retrival. The audience will get acquainted with some basics of corpus technology, which nowadays is a necessary ingredient in creation of glossaries, dictionaries, taxonomies, ontologies. Some insights on natural language are crucial to apprehend problems with soft data in fusion applications. Others are important for grasping the possibilities which open up in NLP enhanced information retrieval from large repositories of special texts.
This tutorial is directed towards researchers and practitioners in various areas of information fusion. In particular: a researcher or a student who wants to retrieve information or one who wants to use proper standard terminology, or one who needs to cope with natural language data in some fusion application. No prerequisite knowledge of natural language processing or information retrieval is expected.