What is an ontology and why is it important
Preliminary
orientation for those new to the field of biomedical ontology
Barry Smith
Background
Increasingly, the NIH
is mandating that data emanating from NIH-funded projects should be
made available in forms
that make these data more broadly usable. To achieve this end a
variety of standards are
being developed for the representation of the biomedical data
resulting from research. One
kind of standard takes the form of what are called “common data
elements” (CDEs), which are
lists of terms devised for use by researchers working in a given
domain, whose meanings are
specified by means of natural language definitions. A second
kind of standard is an
ontology, which provides controlled representations of the phenomena
in a given domain by
means of terms (nouns and noun-phrases) together with logically
structured definitions and
relations. When the terms in question are used to annotate (tag,
describe) data created by
multiple heterogeneous research groups, this brings important
advantages in retrievability and in integration of data. As is shown by
the case of the Gene
Ontology, this allows
new kinds of information-based scientific and clinical research.
Ontologies vs. Common
Data Elements (CDEs)
Ontologies bring a
number of advantages as contrasted with an approach based on CDEs.
They are more easily
extendible and modifiable in light of scientific advance. They are also
more easily factorable,
which means that new ontologies can take advantage in new work of
ontologies created earlier for
other purposes. Ontology technology has been more thoroughly
tested – above all in
molecular biology domains, and in model organism research – and is
able to draw on a variety
of sophisticated software tools.
In contrast to
existing CDE-based approaches, new ontologies are being deliberately built in
such a way as to work
well with existing ontologies (for example, within the OBO Foundry, an
initiative to create a complete
set of ontologies covering the basic biological sciences and
extending from there to
clinical medicine).1 Thus if an ontology is once developed for a
domain
such as orofacial pain, then the same ontology can be reconfigured
to serve other pain
domains, and lessons learned
from its use in one domain can be easily communicated to
those using the
reconfigured ontologies in the other domains. Ontologies are structured in
such a way as to allow
enhanced retrieval of and automated reasoning over information not
only by insiders (who
tend to be the ones familiar with CDEs) but also by outsiders (including
those working in other
disciplines).
Ontologies developed
within the framework of the OBO Foundry are distinguished from CDEs
further in that they are
determined not by how the information about a given domain is
organized, but rather by the
biology of the domain. The strategy rests on the idea that,
because the biology is
common to all the various data artifacts produced by different groups,
the biology can serve to
ground a common representation – the ontology – which can
integrate these various
different data artifacts together.
Difference between
ontology and taxonomy
Ontologies are ways
of annotating (tagging) data. The resultant annotations make the data
searchable not only through the
use of ontology terms, but also through use of logically
related terms; thus the
ontology can be used to retrieve data associated with terms referring to
parts of specific
anatomical entities, to anatomical entities immediately connected to specific
anatomical entities, or to
biological processes in which specific anatomical entities participate.
Considered in
graph-theoretic terms, the terms in the ontology are nodes, connected together
by means of edges
representing relations such as subtype, part_of, connected_to,
and so on. A taxonomy, conceived in this light, is one very simple kind
of ontology, in that only
the one relation of subtype is recognized. A taxonomy is, in other words, just the first step
towards an ontology – which
adds logical relations, definitions, and a structure that is
designed to allow easy
integration with other ontologies and thus with other taxonomies.
Moreover, the
additional relations provided by an ontology, as
contrasted with a simple
taxonomy, provide the
necessary linkages across data sets, the basis for additional analytic
approaches, and enriched
theoretical modeling. See for instance the relations has_part, etc.
indicated in the right-hand
column here (from the Protein Ontology):
How will an ontology yield a description of disease that works
clinically?
One illustration of
how ontology development can help in clinical research and in diagnosis
and treatment is
provided by the Infectious Disease Ontology (IDO), a joint initiative of the
University at Buffalo
and Duke University Medical Center, together with infectious disease
researchers throughout the
world. The Infectious Disease Ontology is designed to allow
geneticists, scientists,
clinicians and public health agencies to more easily share and compare
many different types of
data concerning pathogens, patients and disease processes. Diseases
being studied by the IDO
Consortium include malaria and other vector-borne diseases,
tuberculosis, infective
endocarditis, MRSA, influenza and dengue fever.
1
Smith
B et al.
The OBO Foundry: Coordinated evolution of ontologies to support biomedical data
integration, Nature
Biotechnology 2007; 25 (11): 1251-1255.
Supplementary Reading
http://ontology.buffalo.edu/biomedical.htm
http://www.cs.man.ac.uk/~stevensr/menupages/background.php
http://bioontology.org/wiki/index.php/Introduction_to_Biomedical_Ontologies
http://www.genomicglossaries.com/content/ontologies.asp