What is an ontology and why is it important

Preliminary orientation for those new to the field of biomedical ontology

Barry Smith

Background

Increasingly, the NIH is mandating that data emanating from NIH-funded projects should be

made available in forms that make these data more broadly usable. To achieve this end a

variety of standards are being developed for the representation of the biomedical data

resulting from research. One kind of standard takes the form of what are called “common data

elements” (CDEs), which are lists of terms devised for use by researchers working in a given

domain, whose meanings are specified by means of natural language definitions. A second

kind of standard is an ontology, which provides controlled representations of the phenomena

in a given domain by means of terms (nouns and noun-phrases) together with logically

structured definitions and relations. When the terms in question are used to annotate (tag,

describe) data created by multiple heterogeneous research groups, this brings important

advantages in retrievability and in integration of data. As is shown by the case of the Gene

Ontology, this allows new kinds of information-based scientific and clinical research.

Ontologies vs. Common Data Elements (CDEs)

Ontologies bring a number of advantages as contrasted with an approach based on CDEs.

They are more easily extendible and modifiable in light of scientific advance. They are also

more easily factorable, which means that new ontologies can take advantage in new work of

ontologies created earlier for other purposes. Ontology technology has been more thoroughly

tested – above all in molecular biology domains, and in model organism research – and is

able to draw on a variety of sophisticated software tools.

In contrast to existing CDE-based approaches, new ontologies are being deliberately built in

such a way as to work well with existing ontologies (for example, within the OBO Foundry, an

initiative to create a complete set of ontologies covering the basic biological sciences and

extending from there to clinical medicine).¹ Thus if an ontology is once developed for a domain

such as orofacial pain, then the same ontology can be reconfigured to serve other pain

domains, and lessons learned from its use in one domain can be easily communicated to

those using the reconfigured ontologies in the other domains. Ontologies are structured in

such a way as to allow enhanced retrieval of and automated reasoning over information not

only by insiders (who tend to be the ones familiar with CDEs) but also by outsiders (including

those working in other disciplines).

Ontologies developed within the framework of the OBO Foundry are distinguished from CDEs

further in that they are determined not by how the information about a given domain is

organized, but rather by the biology of the domain. The strategy rests on the idea that,

because the biology is common to all the various data artifacts produced by different groups,

the biology can serve to ground a common representation – the ontology – which can

integrate these various different data artifacts together.

Difference between ontology and taxonomy

Ontologies are ways of annotating (tagging) data. The resultant annotations make the data

searchable not only through the use of ontology terms, but also through use of logically

related terms; thus the ontology can be used to retrieve data associated with terms referring to

parts of specific anatomical entities, to anatomical entities immediately connected to specific

anatomical entities, or to biological processes in which specific anatomical entities participate.

Considered in graph-theoretic terms, the terms in the ontology are nodes, connected together

by means of edges representing relations such as subtype, part_of, connected_to,

and so on. A taxonomy, conceived in this light, is one very simple kind of ontology, in that only

the one relation of subtype is recognized. A taxonomy is, in other words, just the first step

towards an ontology – which adds logical relations, definitions, and a structure that is

designed to allow easy integration with other ontologies and thus with other taxonomies.

Moreover, the additional relations provided by an ontology, as contrasted with a simple

taxonomy, provide the necessary linkages across data sets, the basis for additional analytic

approaches, and enriched theoretical modeling. See for instance the relations has_part, etc.

indicated in the right-hand column here (from the Protein Ontology):

How will an ontology yield a description of disease that works clinically?

One illustration of how ontology development can help in clinical research and in diagnosis

and treatment is provided by the Infectious Disease Ontology (IDO), a joint initiative of the

University at Buffalo and Duke University Medical Center, together with infectious disease

researchers throughout the world. The Infectious Disease Ontology is designed to allow

geneticists, scientists, clinicians and public health agencies to more easily share and compare

many different types of data concerning pathogens, patients and disease processes. Diseases

being studied by the IDO Consortium include malaria and other vector-borne diseases,

tuberculosis, infective endocarditis, MRSA, influenza and dengue fever.

¹Smith B et al. The OBO Foundry: Coordinated evolution of ontologies to support biomedical data

integration, Nature Biotechnology 2007; 25 (11): 1251-1255.

Supplementary Reading

http://ontology.buffalo.edu/biomedical.htm

http://www.cs.man.ac.uk/~stevensr/menupages/background.php

http://bioontology.org/wiki/index.php/Introduction_to_Biomedical_Ontologies

http://www.genomicglossaries.com/content/ontologies.asp