PharmaDD Top News: Business, Technology, Strategic Briefings - Tracking leading techniques and approaches in therapeutic drug discovery and development

 

Sponsored Links:
Prescription Drug Addiction

 

 

Pharmaceutical Discovery, Dec 1, 2005 
Gene Expression Profiling of Esophageal Cancer Using Laser Capture Microdissected Samples

By Michelle Chen , Kaho Minoura , Siqun Wang , Tetsuo Noda , Tetsuichiro Muto , Yoshio Miki

Can Genetic Algorithms Solve the Problems with Predictive Biology?
Lukas K. Buehler
Pharmaceutical Discovery

Lukas K. Buehler
In this month's column I would like to address complexity of biological systems. The way we treat complexity impacts the way we can or cannot make progress in drug discovery. Current strategies in rational drug design don't address biological systems as complex systems. When they do address complexity, then it's by way of simplification for reasons that have little do with the system itself but usually it is done to save computational time. Simplification seems innocuous but has far reaching consequences, in my view, because it may cause some to belief that a biological system is predictable in principle as if it were just another reversible chemical reaction occurring at or around thermodynamic equilibrium. A hallmark of life, however, is its dependence on homeostatic processes that operate far away from chemical equilibrium or global energy minima.

I'd like to discuss three examples that should help me make my point. First, complex systems can be understood through analysis, as is the case for cyclooxygenase enzyme isoforms 1 and 2 that are both involved in regulating platelet aggregation — the leading cause of side effects from non-steroidal anti-inflammatory pain medication. Second, while complex systems usually are stable and operate according to some program (e.g., protein folding), they do not produce optimal solutions and are thus not in every respect predictable regarding their future behavior — particularly in the presence of an added component like a drug. Third, as we will see from a careful analysis of 'evolution-based' engineering (genetic algorithms) of molecular structures with desired biological function, success depends on hindsight (1) (i.e., a carefully chosen fitness function). All three accounts are helpful in discussing the role of analysis in understanding complexity and the limits of prediction in biology.

Let's start with the role of analysis in biology. The currently well recognized cardiovascular risks of Cox-2 inhibitors are easy to explain now that most of the details of how they affect the regulatory system of platelet aggregation — the first step of blood clotting — have been described through analysis. Knowing the details, it is easy to forget that Cox-2 inhibitor development (a spectacular example of rational drug design) and approval process relied on more limited information just a few years ago. Even if the cardiovascular risk could have been predicted as a possible scenario, they were potential problems of uncertain nature. It's a matter of risk assessment.

So what has changed and what do we know? One of the important factors in developing Cox-2 specific drugs was the understanding that Cox-1 is the 'regular' enzyme used for normal physiological activity, while Cox-2 is induced and used only during inflammation. The implied message that regular physiology will not be affected by Cox-2 inhibitors is now known to be wrong and misleading. Defying our clear-cut categories of induced vs. constitutive isoforms of cyclooxygenase enzymes, endothelial cells of blood vessels express Cox-2 as part of their regular physiology. Endothelial cells synthesize and secrete prostacyclin (prostaglandin I, series 2; PGI2) which prevents platelets from aggregating locally. Platelets in turn depend on Cox-1 to make thromboxan A2 (TXA2), a paracrine signal stimulating platelets to aggregate with each other. In a healthy blood vessel, both signaling molecules continuously are present and the levels of PGI2 and TXA2 are balanced to prevent platelets from aggregating. Such a balance means that the system can be disturbed by either increasing the concentration of one or decreasing the concentration of the other signal. When a vessel wall leaks or is injured, PGI2 levels drop and those of TXA2 increase, shifting the balance towards clotting. The latter occurs only at injured sites because PGI2 still is secreted from undamaged endothelial cells.

Unfortunately, Cox-2 inhibitors reduce the levels of clot-preventing prostaglandin signals but not those of clot-inducing thromboxans. Consequently, Cox-2 inhibitors shift the balance in favor of platelet aggregation. All in all, this increases the probability of spontaneous clotting in coronary arteries of the heart. Small changes in either signal are of little consequence because the system behaves like a buffer and is resistant against small fluctuations. This likely is the reason why cardiovascular problems are rare at low dosages of pain medication and become apparent only when increasing inhibitor levels, using a higher affinity ligand or using it over extended periods of time. Aspirin, in contrast, lowers the levels of both TXA2 and PGI2 because it does not discriminate among the two Cox isoforms. Since aspirin affects both signals, the balance favoring healthy blood vessels remains unchanged, although the system may become more sensitive towards random fluctuations in either signal. In times of injury, however, the aspirin induced reduction of thromboxan A2 levels delays blood clotting and favors bleeding, a well known side effect in people with stomach ulcers.

A dietary connection to this process is well worth mentioning. Foods high in omega-6 fatty acids (e.g., red meats) result in PGI2/TXA2 synthesis as described above, but foods with omega-3 fatty acids (fish) result in higher levels of series 3 eicosanoids, PGI3 and TXA3. The platelet dis-aggregation ability of PGI3 is stronger than that of PGI2, while the series-3 thromboxans are less potent at stimulating platelet aggregation. So, even if both eicosanoids are present at the same concentrations and ratios, the series 3 molecules result in a situation that favors platelet dis-aggregation. Combined, a diet rich in omega-3 fatty acid reduces spontaneous platelet aggregation — a possible explanation for the heart protective effect of this class of lipids.

Now let me switch to rational drug design, which has the goal of prediction rather than analysis. The current use of genetic algorithms serves as a fine example for discussing the intrinsic limits of predictive algorithms. In my last column, I stated with some confidence that progress in drug design must overcome the barrier of complexity in biological organisms (2). By way of their construction, complex systems depend upon a degree of randomness and are susceptible to chance events that can best be characterized by fluctuations around a normal range within which the system is stable. Because of this range of values within which a system fluctuates, complex systems are not deterministic in the sense we think about mechanical systems (3). A well analyzed complex system is protein folding, which is an entirely analogous process (in reductionist terms) to a ligand binding with its receptor.

Protein folding refers to the process where the formation of a functional structure is programmed by the amino acid sequence of the protein. During evolution, what is inherited is not a structure, but a set of instructions for the folding process during protein synthesis. Protein structures are ultimately programmed by the genome (DNA) of an organism. The execution of the program depends on a plethora of additional protein and RNA enzymes whose structures are themselves programmed in the genome someplace else (note: we call these programs genes).

Protein folding is an excellent example of the limits of predicting the outcome of a process in complex systems. It is referred to as the 'protein folding problem', because after half a century of the brightest minds investigating how a genetic sequence determines a protein structure, it has become abundantly clear that predicting structures (global energy minima) from novel sequences can be achieved with high accuracy only with prior knowledge of existing similar solutions (i.e., high resolution structures of related proteins, because the number of theoretical structures for any given sequence is enormous). We refer to all possible solutions as sequence or search space.

This means, among other things, that structural genomics will only be successful with the experimental determination of a protein structure that represents a novel family of sequences. The protein folding problem, however, is not just theoretical or based on a lack of information from our part. Take for instance chaperones, proteins that assist the folding of other proteins in real cells. The existence of chaperones (heat shock proteins) indicates that protein folding in vivo does not always proceed to the desired result and that 'finding a global energy minimum' is not only tough for modelers but for proteins themselves (misfolding). Protein misfolding is not simply based on an error in the program (which can be the case with mutations), but is due to the fickleness or random fluctuations within a dynamic system; a macromolecule compacting in solution. There still is a strong need for protein biochemists to help us sort out the details experimentally.

From a reductionist point of view (e.g., sequence determines structure), complexity seems to be nothing mysterious. After all, even biological systems follow the rules of physics. When all the details are known, we should be able to explain a function — cellular, physiological, even behavioral. It works well in narrowly defined and controlled experiments where the goal is to eliminate complexity for the sake of clarity. Of course, by reducing the complexity we lose, by default, information about the system itself.

So, what does all this have to do with genetic algorithms? A problem with rational drug design, like protein folding, is that we don't know which structures have biological activity or represent the global energy minimum just by looking at them, so we want to find them from a set of possible solutions. This set, in the extreme, may include all possible structures made from a select combination of building blocks (atoms and bond types) — a situation referred to as sequence or search space. Search space refers to all possible chemical structures and somewhere in 'space' are one or several best solutions (i.e., novel molecular structures that function as ligands for a target protein). One way to find the 'best' solution is to compute the entire search space and test each and every one for the desired function. Search space is enormous and makes computation an impractical brute force approach. Genetic algorithms provide a way of reducing search space in an iterative manner, where the results of semi-random changes (mutant operators) to a starter structure are tested against a desired solution (fitness function). This process quickly results in increasingly better fits to the expected solution.

Genetic algorithms adopt the language of evolution. They are ingenious solutions to computationally demanding problems and inadvertently demonstrate not only the important role chance plays in biological systems but the intrinsic non-deterministic nature of complex systems as well. The key to understanding how genetic algorithms work is their 'fitness function'. The quality of a fitness function always improves with post hoc assessment, commonly known as hindsight. Thus, they don't solve the problem of predictability. Evolutionary methods, like genetic algorithms, as well as neural networks do a good job at quickly identifying lead compounds because they all are iterative methods, thus reducing the need to make large combinatorial libraries of random compounds. Their success obviously depends on the availability of input information (a 'starter' structure) and the fitness function. While modeled after biological evolution, their fitness functions mimic artificial selection, akin to animal and plant breeding; they should be called breeding functions. Still, these programs capture an essential element of evolution. They implement mutation operators and random joining of 'parent' structures to generate 'children'. The chance element is reflected in the fact that different runs produce different solutions and that solutions usually are suboptimal.

The fact that genetic algorithms find suboptimal solutions causes justified complaints among designers whose goal is finding optimal solutions. Yet the outcomes of genetic algorithms mimics the modus operandi of evolution that seeks no particular solution, and the fitness function is not a best solution (it is 'blind') (4), but simply the number of viable offspring, much of which depends on chance. The idea of calling a design strategy evolutionary is ironic but understandable in light of the fact that the most successful drug discovery is achieved when working with natural compounds and modifying their structures to get higher affinity ligands. Nature is neither optimal nor goal-oriented, but emulates a very robust trial and error method. If we can learn anything from nature, then it is that trial and error is the only strategy to find novel solutions in complex organisms.

The dependence on post hoc assessment in predictive biology has been demonstrated using genetic algorithms for virtual screening for drug side effects by searching several targets for one ligand (5). This study is based on a comparative analysis of ADP and GDP binding sites on ADP and GDP receptors. The conclusions derived from it are two-fold: First, the study supports the general strategy of using docking energy as fitness function to predict specificity of binding and second, that this prediction is only reliable for protein structures for which a high resolution structure in the presence of a bound ligand already exists. Interestingly, crystal structures without bound ligands are not reliable, indicating the importance of induced fit mechanism (protein dynamics) for binding.

Obviously, rational drug design still depends on known solutions to find novel structures that do the same thing. But why? The problem is, we still have no idea what makes a molecule a drug or a biologically active one. We have to test our prediction experimentally in order to find out. This is what I mean by unpredictability. Ideally, all trial and error techniques and random searches through sequence space would not be necessary if, for instance, we knew the rules of what makes a molecule a biological one, or a good drug (6). A further demonstration of the importance of post hoc analysis in drug design comes from a related study that compares combinatorial libraries with natural product libraries (7). In this study, the authors found a 20% hit rate from searching biased libraries (medicinal herbs), while the success rate dropped to 10% when basing a search on molecules found in synthetic libraries. The strategy of relying on biased libraries is like saying if you want an apple, go pick your fruit off an apple tree.

Real progress (faster, cheaper) in drug discovery can only come from implementing biological function algorithms, not those based on structural data alone. But biological function means that the properties of complex systems usually are those that cannot be explained by understanding the building block itself. We call functions of complex systems emergent properties and the system itself is considered irreducibly complex. Far from invoking a God to rationalize their existence, we nevertheless should accept the limitations complexity imposes on our scientific methodology. The reason for irreducible complexity in biology has to do with the dynamics of homeostatic systems where process is critical, not just composition. Living structures are continuously self-regenerating and adjusting their composition and interaction networks. A body plus a drug is not the same body without the drug, but a new body with a new homeostatic network. We need a theory of homeostasis to make substantial progress in biology.

Lukas K. Buehler is the founder of SciScript Inc., in San Diego, California of San Diego’s Extension Bioscience Program. He can be reached at

References

1. L.K. Buehler, Pharma DD 3(5), 20–21 (2003).

2. L.K. Buehler, Pharmaceutical Discovery 2(5), 26–28 (2005).

3. E. Mayr, What Makes Biology Unique?: Considerations on the Autonomy of a Scientific Discipline. (New York, Cambridge University Press, 2004).

4. R. Dawkins, The Blind Watchmaker: Why the Evidence of Evolution Reveals a Universe Without Design (New York, Norton, 1996).

5. W.M. Rockey and A.H. Elcock, Proteins 48(4), 664–671 (2002).

6. D. Douguet, H. Munier-Lehmann et al., J Med Chem 48(7), 2457–2468 (2005).

7. J.M. Rollinger, S. Haupt et al., J. Chem. Inf. Comput. Sci. 44(2), 480–488 (2004).