| Dec
1, 2005 |
| By:
Lukas
K. Buehler |
| Pharmaceutical
Discovery |
|

Lukas K. Buehler
|
In this month's column I would like to address complexity of biological
systems. The way we treat complexity impacts the way we can or cannot make
progress in drug discovery. Current strategies in rational drug design
don't address biological systems as complex systems. When they do address
complexity, then it's by way of simplification for reasons that have
little do with the system itself but usually it is done to save
computational time. Simplification seems innocuous but has far reaching
consequences, in my view, because it may cause some to belief that a
biological system is predictable in principle as if it were just another
reversible chemical reaction occurring at or around thermodynamic
equilibrium. A hallmark of life, however, is its dependence on homeostatic
processes that operate far away from chemical equilibrium or global energy
minima.
I'd like to discuss three examples that should help me make my point.
First, complex systems can be understood through analysis, as is the case
for cyclooxygenase enzyme isoforms 1 and 2 that are both involved in
regulating platelet aggregation — the leading cause of side effects from
non-steroidal anti-inflammatory pain medication. Second, while complex
systems usually are stable and operate according to some program (e.g.,
protein folding), they do not produce optimal solutions and are thus not
in every respect predictable regarding their future behavior —
particularly in the presence of an added component like a drug. Third, as
we will see from a careful analysis of 'evolution-based' engineering
(genetic algorithms) of molecular structures with desired biological
function, success depends on hindsight (1) (i.e., a carefully chosen
fitness function). All three accounts are helpful in discussing the role
of analysis in understanding complexity and the limits of prediction in
biology.
Let's start with the role of analysis in biology. The currently well
recognized cardiovascular risks of Cox-2 inhibitors are easy to explain
now that most of the details of how they affect the regulatory system of
platelet aggregation — the first step of blood clotting — have been
described through analysis. Knowing the details, it is easy to forget that
Cox-2 inhibitor development (a spectacular example of rational drug
design) and approval process relied on more limited information just a few
years ago. Even if the cardiovascular risk could have been predicted as a
possible scenario, they were potential problems of uncertain nature. It's
a matter of risk assessment.
So what has changed and what do we know? One of the important factors
in developing Cox-2 specific drugs was the understanding that Cox-1 is the
'regular' enzyme used for normal physiological activity, while Cox-2 is
induced and used only during inflammation. The implied message that
regular physiology will not be affected by Cox-2 inhibitors is now known
to be wrong and misleading. Defying our clear-cut categories of induced
vs. constitutive isoforms of cyclooxygenase enzymes, endothelial cells of
blood vessels express Cox-2 as part of their regular physiology.
Endothelial cells synthesize and secrete prostacyclin (prostaglandin I,
series 2; PGI2) which prevents platelets from aggregating locally.
Platelets in turn depend on Cox-1 to make thromboxan A2 (TXA2), a
paracrine signal stimulating platelets to aggregate with each other. In a
healthy blood vessel, both signaling molecules continuously are present
and the levels of PGI2 and TXA2 are balanced to prevent platelets from
aggregating. Such a balance means that the system can be disturbed by
either increasing the concentration of one or decreasing the concentration
of the other signal. When a vessel wall leaks or is injured, PGI2 levels
drop and those of TXA2 increase, shifting the balance towards clotting.
The latter occurs only at injured sites because PGI2 still is secreted
from undamaged endothelial cells.
Unfortunately, Cox-2 inhibitors
reduce the levels of clot-preventing prostaglandin signals but not those
of clot-inducing thromboxans. Consequently, Cox-2 inhibitors shift the
balance in favor of platelet aggregation. All in all, this increases the
probability of spontaneous clotting in coronary arteries of the heart.
Small changes in either signal are of little consequence because the
system behaves like a buffer and is resistant against small fluctuations.
This likely is the reason why cardiovascular problems are rare at low
dosages of pain medication and become apparent only when increasing
inhibitor levels, using a higher affinity ligand or using it over extended
periods of time. Aspirin, in contrast, lowers the levels of both TXA2 and
PGI2 because it does not discriminate among the two Cox isoforms. Since
aspirin affects both signals, the balance favoring healthy blood vessels
remains unchanged, although the system may become more sensitive towards
random fluctuations in either signal. In times of injury, however, the
aspirin induced reduction of thromboxan A2 levels delays blood clotting
and favors bleeding, a well known side effect in people with stomach
ulcers.
A dietary connection to this process
is well worth mentioning. Foods high in omega-6 fatty acids (e.g., red
meats) result in PGI2/TXA2 synthesis as described above, but foods with
omega-3 fatty acids (fish) result in higher levels of series 3 eicosanoids,
PGI3 and TXA3. The platelet dis-aggregation ability of PGI3 is stronger
than that of PGI2, while the series-3 thromboxans are less potent at
stimulating platelet aggregation. So, even if both eicosanoids are present
at the same concentrations and ratios, the series 3 molecules result in a
situation that favors platelet dis-aggregation. Combined, a diet rich in
omega-3 fatty acid reduces spontaneous platelet aggregation — a possible
explanation for the heart protective effect of this class of lipids.
Now let me switch to rational drug
design, which has the goal of prediction rather than analysis. The current
use of genetic algorithms serves as a fine example for discussing the
intrinsic limits of predictive algorithms. In my last column, I stated
with some confidence that progress in drug design must overcome the
barrier of complexity in biological organisms (2). By way of their
construction, complex systems depend upon a degree of randomness and are
susceptible to chance events that can best be characterized by
fluctuations around a normal range within which the system is stable.
Because of this range of values within which a system fluctuates, complex
systems are not deterministic in the sense we think about mechanical
systems (3). A well analyzed complex system is protein folding, which is
an entirely analogous process (in reductionist terms) to a ligand binding
with its receptor.
Protein folding refers to the process
where the formation of a functional structure is programmed by the amino
acid sequence of the protein. During evolution, what is inherited is not a
structure, but a set of instructions for the folding process during
protein synthesis. Protein structures are ultimately programmed by the
genome (DNA) of an organism. The execution of the program depends on a
plethora of additional protein and RNA enzymes whose structures are
themselves programmed in the genome someplace else (note: we call these
programs genes).
Protein folding is an excellent
example of the limits of predicting the outcome of a process in complex
systems. It is referred to as the 'protein folding problem', because after
half a century of the brightest minds investigating how a genetic sequence
determines a protein structure, it has become abundantly clear that
predicting structures (global energy minima) from novel sequences can be
achieved with high accuracy only with prior knowledge of existing similar
solutions (i.e., high resolution structures of related proteins, because
the number of theoretical structures for any given sequence is enormous).
We refer to all possible solutions as sequence or search space.
This means, among other things, that
structural genomics will only be successful with the experimental
determination of a protein structure that represents a novel family of
sequences. The protein folding problem, however, is not just theoretical
or based on a lack of information from our part. Take for instance
chaperones, proteins that assist the folding of other proteins in real
cells. The existence of chaperones (heat shock proteins) indicates that
protein folding in vivo does not always proceed to the desired
result and that 'finding a global energy minimum' is not only tough for
modelers but for proteins themselves (misfolding). Protein misfolding is
not simply based on an error in the program (which can be the case with
mutations), but is due to the fickleness or random fluctuations within a
dynamic system; a macromolecule compacting in solution. There still is a
strong need for protein biochemists to help us sort out the details
experimentally.
From a reductionist point of view
(e.g., sequence determines structure), complexity seems to be nothing
mysterious. After all, even biological systems follow the rules of
physics. When all the details are known, we should be able to explain a
function — cellular, physiological, even behavioral. It works well in
narrowly defined and controlled experiments where the goal is to eliminate
complexity for the sake of clarity. Of course, by reducing the complexity
we lose, by default, information about the system itself.
So, what does all this have to do
with genetic algorithms? A problem with rational drug design, like protein
folding, is that we don't know which structures have biological activity
or represent the global energy minimum just by looking at them, so we want
to find them from a set of possible solutions. This set, in the extreme,
may include all possible structures made from a select combination of
building blocks (atoms and bond types) — a situation referred to as
sequence or search space. Search space refers to all possible chemical
structures and somewhere in 'space' are one or several best solutions
(i.e., novel molecular structures that function as ligands for a target
protein). One way to find the 'best' solution is to compute the entire
search space and test each and every one for the desired function. Search
space is enormous and makes computation an impractical brute force
approach. Genetic algorithms provide a way of reducing search space in an
iterative manner, where the results of semi-random changes (mutant
operators) to a starter structure are tested against a desired solution
(fitness function). This process quickly results in increasingly better
fits to the expected solution.
Genetic algorithms adopt the language
of evolution. They are ingenious solutions to computationally demanding
problems and inadvertently demonstrate not only the important role chance
plays in biological systems but the intrinsic non-deterministic nature of
complex systems as well. The key to understanding how genetic algorithms
work is their 'fitness function'. The quality of a fitness function always
improves with post hoc assessment, commonly known as hindsight.
Thus, they don't solve the problem of predictability. Evolutionary
methods, like genetic algorithms, as well as neural networks do a good job
at quickly identifying lead compounds because they all are iterative
methods, thus reducing the need to make large combinatorial libraries of
random compounds. Their success obviously depends on the availability of
input information (a 'starter' structure) and the fitness function. While
modeled after biological evolution, their fitness functions mimic
artificial selection, akin to animal and plant breeding; they should be
called breeding functions. Still, these programs capture an essential
element of evolution. They implement mutation operators and random joining
of 'parent' structures to generate 'children'. The chance element is
reflected in the fact that different runs produce different solutions and
that solutions usually are suboptimal.
The fact that genetic algorithms find
suboptimal solutions causes justified complaints among designers whose
goal is finding optimal solutions. Yet the outcomes of genetic algorithms
mimics the modus operandi of evolution that seeks no particular solution,
and the fitness function is not a best solution (it is 'blind') (4), but
simply the number of viable offspring, much of which depends on chance.
The idea of calling a design strategy evolutionary is ironic but
understandable in light of the fact that the most successful drug
discovery is achieved when working with natural compounds and modifying
their structures to get higher affinity ligands. Nature is neither optimal
nor goal-oriented, but emulates a very robust trial and error method. If
we can learn anything from nature, then it is that trial and error is the
only strategy to find novel solutions in complex organisms.
The dependence on post hoc
assessment in predictive biology has been demonstrated using genetic
algorithms for virtual screening for drug side effects by searching
several targets for one ligand (5). This study is based on a comparative
analysis of ADP and GDP binding sites on ADP and GDP receptors. The
conclusions derived from it are two-fold: First, the study supports the
general strategy of using docking energy as fitness function to predict
specificity of binding and second, that this prediction is only reliable
for protein structures for which a high resolution structure in the
presence of a bound ligand already exists. Interestingly, crystal
structures without bound ligands are not reliable, indicating the
importance of induced fit mechanism (protein dynamics) for binding.
Obviously, rational drug design still
depends on known solutions to find novel structures that do the same
thing. But why? The problem is, we still have no idea what makes a
molecule a drug or a biologically active one. We have to test our
prediction experimentally in order to find out. This is what I mean by
unpredictability. Ideally, all trial and error techniques and random
searches through sequence space would not be necessary if, for instance,
we knew the rules of what makes a molecule a biological one, or a good
drug (6). A further demonstration of the importance of post hoc
analysis in drug design comes from a related study that compares
combinatorial libraries with natural product libraries (7). In this study,
the authors found a 20% hit rate from searching biased libraries
(medicinal herbs), while the success rate dropped to 10% when basing a
search on molecules found in synthetic libraries. The strategy of relying
on biased libraries is like saying if you want an apple, go pick your
fruit off an apple tree.
Real progress (faster, cheaper) in
drug discovery can only come from implementing biological function
algorithms, not those based on structural data alone. But biological
function means that the properties of complex systems usually are those
that cannot be explained by understanding the building block itself. We
call functions of complex systems emergent properties and the system
itself is considered irreducibly complex. Far from invoking a God to
rationalize their existence, we nevertheless should accept the limitations
complexity imposes on our scientific methodology. The reason for
irreducible complexity in biology has to do with the dynamics of
homeostatic systems where process is critical, not just composition.
Living structures are continuously self-regenerating and adjusting their
composition and interaction networks. A body plus a drug is not the same
body without the drug, but a new body with a new homeostatic network. We
need a theory of homeostasis to make substantial progress in biology.
Lukas K. Buehler is the
founder of SciScript Inc., in San Diego, California of San Diego’s
Extension Bioscience Program. He can be reached at
lbuehler@ucsd.edu.
References
1. L.K. Buehler, Pharma DD 3(5),
20–21 (2003).
2. L.K. Buehler, Pharmaceutical
Discovery 2(5), 26–28 (2005).
3. E. Mayr, What Makes Biology
Unique?: Considerations on the Autonomy of a Scientific Discipline.
(New York, Cambridge University Press, 2004).
4. R. Dawkins, The Blind
Watchmaker: Why the Evidence of Evolution Reveals a Universe Without
Design (New York, Norton, 1996).
5. W.M. Rockey and A.H. Elcock, Proteins
48(4), 664–671 (2002).
6. D. Douguet, H. Munier-Lehmann et
al., J Med Chem 48(7), 2457–2468 (2005).
7. J.M. Rollinger, S. Haupt et al., J.
Chem. Inf. Comput. Sci. 44(2), 480–488 (2004).
|