What is an ontology and why is it important
Preliminary orientation for those new to the field of biomedical ontology
Increasingly, the NIH is mandating that data emanating from NIH-funded projects should be
made available in forms that make these data more broadly usable. To achieve this end a
variety of standards are being developed for the representation of the biomedical data
resulting from research. One kind of standard takes the form of what are called “common data
elements” (CDEs), which are lists of terms devised for use by researchers working in a given
domain, whose meanings are specified by means of natural language definitions. A second
kind of standard is an ontology, which provides controlled representations of the phenomena
in a given domain by means of terms (nouns and noun-phrases) together with logically
structured definitions and relations. When the terms in question are used to annotate (tag,
describe) data created by multiple heterogeneous research groups, this brings important
advantages in retrievability and in integration of data. As is shown by the case of the Gene
Ontology, this allows new kinds of information-based scientific and clinical research.
Ontologies vs. Common Data Elements (CDEs)
Ontologies bring a number of advantages as contrasted with an approach based on CDEs.
They are more easily extendible and modifiable in light of scientific advance. They are also
more easily factorable, which means that new ontologies can take advantage in new work of
ontologies created earlier for other purposes. Ontology technology has been more thoroughly
tested – above all in molecular biology domains, and in model organism research – and is
able to draw on a variety of sophisticated software tools.
In contrast to existing CDE-based approaches, new ontologies are being deliberately built in
such a way as to work well with existing ontologies (for example, within the OBO Foundry, an
initiative to create a complete set of ontologies covering the basic biological sciences and
extending from there to clinical medicine).1 Thus if an ontology is once developed for a domain
such as orofacial pain, then the same ontology can be reconfigured to serve other pain
domains, and lessons learned from its use in one domain can be easily communicated to
those using the reconfigured ontologies in the other domains. Ontologies are structured in
such a way as to allow enhanced retrieval of and automated reasoning over information not
only by insiders (who tend to be the ones familiar with CDEs) but also by outsiders (including
those working in other disciplines).
Ontologies developed within the framework of the OBO Foundry are distinguished from CDEs
further in that they are determined not by how the information about a given domain is
organized, but rather by the biology of the domain. The strategy rests on the idea that,
because the biology is common to all the various data artifacts produced by different groups,
the biology can serve to ground a common representation – the ontology – which can
integrate these various different data artifacts together.
Difference between ontology and taxonomy
Ontologies are ways of annotating (tagging) data. The resultant annotations make the data
searchable not only through the use of ontology terms, but also through use of logically
related terms; thus the ontology can be used to retrieve data associated with terms referring to
parts of specific anatomical entities, to anatomical entities immediately connected to specific
anatomical entities, or to biological processes in which specific anatomical entities participate.
Considered in graph-theoretic terms, the terms in the ontology are nodes, connected together
by means of edges representing relations such as subtype, part_of, connected_to,
and so on. A taxonomy, conceived in this light, is one very simple kind of ontology, in that only
the one relation of subtype is recognized. A taxonomy is, in other words, just the first step
towards an ontology – which adds logical relations, definitions, and a structure that is
designed to allow easy integration with other ontologies and thus with other taxonomies.
Moreover, the additional relations provided by an ontology, as contrasted with a simple
taxonomy, provide the necessary linkages across data sets, the basis for additional analytic
approaches, and enriched theoretical modeling. See for instance the relations has_part, etc.
indicated in the right-hand column here (from the Protein Ontology):
How will an ontology yield a description of disease that works clinically?
One illustration of how ontology development can help in clinical research and in diagnosis
and treatment is provided by the Infectious Disease Ontology (IDO), a joint initiative of the
University at Buffalo and Duke University Medical Center, together with infectious disease
researchers throughout the world. The Infectious Disease Ontology is designed to allow
geneticists, scientists, clinicians and public health agencies to more easily share and compare
many different types of data concerning pathogens, patients and disease processes. Diseases
being studied by the IDO Consortium include malaria and other vector-borne diseases,
tuberculosis, infective endocarditis, MRSA, influenza and dengue fever.
1 Smith B et al. The OBO Foundry: Coordinated evolution of ontologies to support biomedical data
integration, Nature Biotechnology 2007; 25 (11): 1251-1255.