ECDL 2003 Tutorials


Tutorial 1 : Usability Evaluation of Digital Libraries

Presenters:
Suzette Keith, Interaction Design Centre, Middlesex University.
Bob Fields, Interaction Design Centre, Middlesex University.

Abstract: Digital libraries are notoriously difficult to design well in terms of their eventual usability. In this tutorial we will present an overview of usability issues and techniques affecting digital library interface design. We will develop scenarios that make use of information seeking models to describe the users, their goals and activities. The scenarios describe the user interaction and provide the context for examining the effect of the design on the user. Claims made about the design will be examined with reference to broad usability principles, cognitive models and models of information seeking. Through a graduated series of worked examples participants will get hands-on experience of how to develop scenarios and apply claims analysis. Trade-offs between positive and negative claims and the effect on the design activity will be examined.

Target audience: This is an introductory tutorial. Participants are not expected to have prior experience of usability techniques for digital libraries, or of human computer interaction more generally. You are, however, expected to have some experience of working with digital libraries, or of delivering digital library services.

Duration: Full-day, 09.00-17.30, Sunday August 17.


Tutorial 2 : Indexing and Searching Audio/Video Documents in Multimedia Digital Libraries

Presenters:
Giuseppe Amato, ISTI-CNR, Pisa, Italy.
Claudio Gennaro, ISTI-CNR, Pisa, Italy.
Pasquale Savino, ISTI-CNR, Pisa, Italy.

Abstract: The aim is to provide a theoretical and experimental background on the techniques and the methodologies for the organization, creation, and management of an Audio/Video Digital Library (A/V DL). The frontier of DLs consists in the possibility of managing multimedia documents other than pure textual information. In particular, due to the large amount of A/V material that is available in a digital for and due to the importance of this material for many aspects - economic, environmental, health, cultural, social, etc. - of everyday life, the management of A/V DL is becoming of crucial importance.
The course will illustrate the techniques and the methodologies to design, build and maintain an A/V DL. Extensive examples will be done of existing systems and approaches. In particular, as a running example we will refer to the ECHO system, which provides a DL service for historical films. It allows to index and retrieve the A/V material by using speech transcripts, video features automatically extracted from the video and metadata manually associated by the user. Metadata are described by using an A/V metadata model based on the IFLA-FRBR standard.

Target audience: Librarians, Archivists, Computer Scientist in the audio/video processing field.

Duration: Half-day, 09.00-12.30, Sunday August 17.


Tutorial 3 : Music Information Retrieval using Audio and Symbolic Representations

Presenter: George Tzanetakis, Computer Science Department, Carnegie Mellon University.

Abstract: The capacity to store and the ability to distribute large collections of multimedia information is increasing every day. A large percentage of this data as well as of current internet traffic consists of music files either in compressed audio format or in symbolic representation. As the recording industry is gradually moving towards digital music distribution there is an increasing need for tools that can help analyze and search large digital libraries of music. Music Information Retrieval (MIR) is an emerging research area dealing with the problems for analyzing, indexing and searching large collections of music. Although music information retrieval has similarities with text, image and video information retrieval it has unique characteristics that pose new challenges to research in digital libraries. In the last few years the emerging field of MIR has been gaining momentum and a large variety of problems, algorithms, tools and ideas have been proposed. As with every evolving field these developments are scattered in various publication forums and there is little work that provides a comprehensive overview of the field especially for researchers that are not directly involved but are interested in learning about it.
Most of existing work in MIR falls into two main categories based on the underlying representation used: 1) symbolic MIR where the underlying representation is some form of musical score and the techniques used are more closely related to text IR and 2) audio MIR where the underlying representation is an audio file and the techniques sued are more closely related to multimedia IR. This tutorial will only cover audio representation. The main emphasis will be on defining various problems in MIR and describe the fundamental ideas and concepts behind solving them rather than providing unnecessary technical details.

Target audience: The intended audience is people involved with digital libraries from academia and industry that are interested in the emerging area of MIR. Familiarity with basic concepts in text and multimedia information retrieval as well as some math will be useful but not necessary.

Duration: Half-day, 14.00-17.30, Sunday August 17.


Tutorial 4 : Thesauri and Ontologies in Digital Libraries 1: Structure and use in knowledge-based assistance to users

Presenter: Dagbert Soergel, College of Information Studies, Univ. of Maryland.

Abstract: This introductory tutorial is intended for anyone concerned with subject access to digital libraries. It provides a bridge by presenting methods of subject access as treated in an information studies program for those coming to digital libraries from other fields. It will elucidate through examples the conceptual and vocabulary problems users face when searching digital libraries. It will then show how a well-structured thesaurus / ontology can be used as the knowledge base for an interface that can assist users with search topic clarification (for example through browsing well-structured hierarchies and guided facet analysis) and with finding good search terms (through query term mapping and query term expansion — synonyms and hierarchic inclusion). It will touch on cross-database and cross-language searching as natural extensions of these functions. The workshop will cover the thesaurus structure needed to support these functions: Concept-term relationships for vocabulary control and synonym expansion, conceptual structure (semantic analysis, facets, and hierarchy) for topic clarification and hierarchic query term expansion). It will introduce a few sample thesauri and some thesaurus-supported digital libraries and Web sites to illustrate these principles.

Target audience: This introductory tutorial is intended for anyone concerned with subject access to digital libraries.

Duration: Half-day, 09.00-12.30, Sunday August 17.


Tutorial 5 : Thesauri and Ontologies in Digital Libraries 2: Design, evaluation, and development

Presenter: Dagobert Soergel, College of Information Studies, Univ. of Maryland.

Abstract: This tutorial is intended for people who have a basic familiarity with the function and structure of thesauri and ontologies. It will introduce criteria for the design and evaluation of thesauri and ontologies and then deal with methods and tools for their development: Locating sources; collecting concepts, terms. and relationships to reuse existing knowledge; developing and refining thesaurus/ontology structure; software and database structure for the development and maintenance of thesauri and ontologies; collaborative development of thesauri and ontologies; developing crosswalks / mappings between thesauri/ontologies. In summing up, the tutorial will address the question of the amount of resources needed to develop and maintain a thesaurus or ontology.

Target audience: This tutorial is intended for people who have a basic familiarity with the function and structure of thesauri and ontologies.

Duration: Half-day, 14.00-17.30, Sunday August 17.


Tutorial 6 : Introduction to Geo-Referencing in Digital Libraries

Presenters:
Linda Hill, Alexandria Digital Library Project, Department of Geography, University of California, Santa Barbara.
Michael Freeston, Project Coordinator, Alexandria Digital Earth Prototype Project, Department of Computer Science, University of California, Santa Barbara

Abstract: Georeferencing is relating information (e.g., documents, datasets, maps, images, biographical information) to geographic locations through placenames (i.e., toponyms) and place codes (e.g., postal codes) or through geospatial referencing (e.g., longitude and latitude coordinates). The digital library perspective toward georeferencing is a blend of the focus of Geographic Information Systems (GIS) on geospatial coordinates, data layers, and mapping; of map librarianship; and of the traditional library focus on textual representation of location using placenames, administrative unit hierarchies, and other textual forms of spatial reference.
This tutorial covers the broad scope of georeferencing, including an overview of types of georeferenced objects and their characteristics; fundamental concepts of geospatial referencing; georeferencing structures of metadata standards (MARC, FGDC, Dublin Core, and more); gazetteers and their role in translating between textual and geospatial location referencing; supporting database architectures; and geospatial matching in information retrieval. In the process, the major information management standards for geospatial description, retrieval, interoperability, and information exchange will be identified.

Target audience: This is an introductory tutorial that is relevant to those interested in the application of map-based (geospatial) indexing for objects in digital library collections, cataloging and metadata design, knowledge organization systems, information retrieval, information visualization, and in subject fields where georeferencing is key to information analysis, including the social sciences, humanities, and environmental sciences.

Duration: Half-day, 09.00-12.30, Sunday August 17.


Tutorial 7 : How to build a Geospatial Digital Library

Presenters:
James Frew, Assistant Professor, Donald Bren School of Environmental Science and Management, University of California, Santa Barbara.
Gregory A. Janée, University of California, Santa Barbara, Department of Computer Science, Alexandria Digital Library.
Rudolf W. Nottrott, University of California, Santa Barbara, Department of Computer Science, Alexandria Digital Library.
Catherine Masi, Davidson Library, University of California, Santa Barbara,

Abstract: This tutorial will be of interest to individuals or institutions with geospatial digital content which they would like to publish for structured search and retrieval over the Web. The tutorial is based on software developed by the Alexandria Digital Library Project (ADL), which facilitates the creation and management of distributed digital library collections. ADL collections can operate stand-alone for use by individual users, or optionally and seamlessly switch into a distributed mode for web-based information sharing and publication.
Geospatial collections are typically heterogeneous in content and can span items as diverse as maps, historical photographs, field data, remotely sensed images or archeological data. The ADL software allows structured search and retrieval on such heterogeneous data collections, combining the simplicity of Dublin Core with the specificity of a full Boolean query language. The aim of the tutorial is to familiarize participants with the overall technology and with the specific procedures and software involved in setting up a stand-alone or distributed ADL node. As a case study, we will focus on a collection of USGS Digital Raster Graphics (DRG) maps. However, the technology we present is much more general: it can be applied to collections of any georeferenced library objects and, further, to collections of any objects to which a structured discovery technique can be applied. Based on Open Source components and open protocol standards (including Java,Tomcat, XML, JDBC, SQL), the ADL software is freely available and can be installed on all common software and hardware platforms.

Target audience: This tutorial targets individuals or institutions interested in publishing existing collection content for structured search and retrieval either on their local system or Intranet, or over the Internet to a global user community by participating in federated networks of heterogeneous content providers.

Duration: Half-day, 14.00-17.30, Sunday August 17.


Tutorial 8 : Multilingual Information Access

This tutorial has been cancelled. The registration fees will be reimbursed.

Presenter: Fredric Gey, University of California, Berkeley

Abstract: The growth of the Internet and the World Wide Web has made available vast written and spoken resources on a global scale from almost all countries in the world. The languages represented on the web are a reflection of this diversity of resources and, to the serious searcher, documents in languages other than English may provide unique news, cultural insight and altogether different perspectives on our electronic world. Moreover, most of the world’s peoples speak a native tongue other than English. This fact will increasingly be felt on the Internet. According to the Global Internet Statistics (as of January 2001), the majority of internet users speak a non-English language (52% versus 47%) as their native tongue (see http://www.glreach.com/globstat for details). During the past decade rapid progress has been made in developing techniques for Multilingual Information Access. Use of electronic bilingual dictionaries and machine translation software has been augmented by lexicons assembled from aligned bilingual parallel corpora of translated documents, techniques for query expansion, phrase recognition and translation disambiguation. On the other hand, most of these resources have been developed and applied to the major European (English, French, German, Italian and Spanish) and Asian (Chinese, Japanese, Korean) languages.
This half-day tutorial will cover aspects of Multilingual Information Access such as cross language search and retrieval, machine translation and statistical machine translation, multilingual search of the WWW and electronic digital library catalogs, evaluation strategies, evaluation campaigns and test collections for cross-language search effectiveness in the United States (TREC), Japan (NTCIR) and Europe (CLEF).

Target audience: The audience is intended to be professionals in information retrieval or digital library research whose positions may expose them to multilingual digital content. The audience will be exposed in some detail to the basic principles of multilingual search and automatic translation and evaluation of such search. Examples will be from European languages, including languages with non-Roman alphabets such as Russian and Greek, Asian languages where discernment of word boundaries (no white space between words) is a significant challenge, and other languages such as those from the Indian subcontinent.

Duration: Half-day, 09.00-12.30, Sunday August 17.


Tutorial 9 : The CIDOC Conceptual Reference Model - New Standard for Knowledge Sharing

Presenter:
Martin Doerr
, Information Systems Lab, Institute of Computer Science, Foundation for Research and Technology - Hellas (FORTH), Vassilika Vouton.
Stephen Stead, Vice Chair, CIDOC/ICOM.

Abstract: This tutorial will introduce the audience to the CIDOC Conceptual Reference Model, a core ontology and proposed ISO standard (ISO/CD 21127) for the semantic integration of cultural information with library, archive and other information. The CIDOC CRM concentrates on the definition of relationships, rather than classes, in order to capture the underlying semantics of multiple data and metadata structures. This led to a compact model of 80 classes and 130 relationships, easy to comprehend and suitable to serve as a basis for mediation of cultural and other information and thereby provide the semantic 'glue' needed to transform today's disparate, localised information sources into a coherent and valuable global resource. It comprises the concepts characteristic for most museum, archive and library documentation.
The tutorial aims at rendering the necessary knowledge to understand the potential of applying the CRM - where it can be useful and what the major technical issues of an application are. It will present information integration by employing a core ontology of relationships, in contrast to the prescription of a common data format, as an approach applicable to other domains. In a real example, it will demonstrate the solution of typical cases of heterogeneity by intellectually mapping source data structures to the ontology. Participants with some background in information modelling should be able to use the CIDOC CRM in their applications after this course and some further reading.

Target audience: Ontology experts, digital library designers, data warehouse designers, system integrators, portal designers that work in the wider area of cultural and library information, but also IT-Staff of libraries, museums and archives, vendors of cultural and other information systems. Basic knowledge of object-oriented data models is required.

Duration: Half-day, 14.00-17.30, Sunday August 17.