|
Everyone
who works on systems biology seems to have his or her
own definition. Here is a good basic definition from
Wikipedia: “Systems biology is an academic field
that seeks to integrate different levels of
information to understand how biological systems
function. By studying the relationships and
interactions between various parts of a biological
system...it is hoped that eventually an understandable
model of the whole system can be developed.”
Systems
biology is fueled by large-scale, global data sets
such as those from arrays. The traditional methods of
molecular biology gag on these large data sets because
the field’s culture demands that every study tell a
neat, mechanistic story.
A
typical array experiment reveals five to 10,000 active
genes. With traditional methods, these are filtered
down to a list of a thousand or so genes whose
expression changes (although no one ever explains why
the only interesting genes are the ones that change;
as any fan of Sherlock Holmes knows, the dog that
doesn’t bark is sometimes the important clue!). The
list is further pruned to a small number of known
genes from known pathways. Inevitably, most of these
pathways have been previously implicated — this
gives the work its credibility — but one or two have
not, which confers novelty and sizzle. The thousands
of genes that don’t fit the story are simply
ignored.
Systems
biology tries to use more of the data by thinking
globally and moving beyond the known pathways. This
requires sophisticated mathematical and computational
methods for analyzing the data to find interesting
patterns that are not closely linked to known biology.
There’s a tendency to focus on the math and computer
aspects, since this is the new stuff, and to conclude
that systems biology is focused on theory. This is a
wrong conclusion.
Data
Consumption
Systems
biology is squarely an experimental field that eats,
drinks, and breathes data. To do systems biology, you
need an experimental system that is amenable to
large-scale experimentation. Ideally, you want to
perturb your system in numerous ways (e.g., treat your
system with several drugs at different dosages) and
generate data using multiple complementary methods,
for example, expression arrays to measure gene
expression, ChIP-chip to get data on transcription
factor binding, mass-spec proteomics to assess protein
abundance, and single-cell microscopy to track protein
localization. Time course data is especially valuable,
as this lets you watch the system as it responds to
each perturbation. You end up with large amounts of
diverse data that grow further as you add data from
external public or proprietary databases.
This
is not for the data-phobic or those trying to get by
on an R01 budget.
Some
of the computational challenges are obvious. You need
good laboratory informatics to manage the experimental
procedures and collect the data. For familiar data
types, such as arrays, you need the usual software
tools to analyze those data types individually. But
you may also face new data types, such as
protein-protein interaction data, which is widely used
in systems biology.
The
major new challenges arise from the need to integrate
so much large-scale data. Typical large-scale data
sets suffer from high error rates. You should not
accept any single data point at face value. To draw
valid inferences from such data, you have to jointly
analyze many data points from different sources.
Combining data in this way is a central theme of
systems biology.
In
this series of online articles, I will review
products, both commercial and academic, that play a
role in systems biology and the other key fields
supporting translational medicine. The first
review,
accompanying the Web version of this column, describes
the software used at my home institution, the
Institute for Systems Biology (ISB). This includes
academic packages such as Cytoscape, Gaggle, SBEAMS,
and GDxBase, and commercial software from Ingenuity.
Do send along, for our consideration, the names of
products you’d like us to review.
Systems
biology is the next stop on biology’s long
postgenomic journey. It remains to be seen whether
it’ll be a good place to hang for a while or just a
rest stop on the side of the highway. Either way,
there’s lots to see and do. So join me online!
Email
this page to a friend
|