By Greg Gibson
Date: Jan 30, 2009
Greg Gibson explains that disease arises because humans, like all other species on the planet, are an unfinished symphony. Perhaps we are even more unfinished than most, thoroughly out of equilibrium with the modern world, and even a little bit uncomfortable in our own skin. In short, we possess an adolescent genome.
The Adolescent Genome
genetic imperfection Disease is a normal and inevitable part of life that arises from the way that organisms are put together.
unselfish genes The way that different flavors (alleles) of thousands of genes work together establishes how an organism looks or behaves, or how healthy it is.
how genes work and why they come in different flavors We all differ from one another at millions of places in the genome.
three reasons why genes might make us sick Rare alleles that have a large impact, common alleles that have a moderate one, or hundreds of alleles with very small effects can all contribute.
a unified theory of complex disease The combination of rapid human evolution and recent cultural change has pushed us out of a genetic comfort zone, predisposing many more people to disease.
the human genome project Public and private efforts have jointly produced a complete sequence of the human genome that lays the foundation for a century of medical research to come.
genomewide association The scalpel that will be used to isolate most of the major disease susceptibility alleles for complex disease of the next few decades.
Of all the paradoxes in the world, surely one of the most absurd is that the very same genome that gives us life inevitably also takes it away. Even when they aren’t killing us, our genes are generally making existence more difficult than seems absolutely necessary. Very few people escape this world having avoided a bout with cancer or diabetes or asthma or depression, and those who do often end up too senile to remember much of the journey anyway. What good reason could there possibly be for so much suffering and disease?
Maybe there is no good reason, other than that genetic disease is an unavoidable byproduct of the way organisms are assembled; disease arises because humans, like all other species on the planet, are an unfinished symphony. Perhaps we are even more unfinished than most, thoroughly out of equilibrium with the modern world, and even a little bit uncomfortable in our own skin. In short, we possess an adolescent genome.
This notion may seem counterintuitive, because we are so conditioned to think in terms of perfection. A simplistic way to think about biology is to imagine that every species is perfectly suited to whatever ecological niche it occupies. Its genome has evolved to ensure that each individual is made to be as close as possible to the optimum shape and set of functions that a perfect member of the species would have. Adaptation to a dragonfly is having exquisitely refined lace wings, to an orchid it is pitching the lips of its pouting petals at just the most attractive angle, and to a human it is whatever it takes to live a long and comfortable life. Maybe no one individual is ever truly cast as the ideal that defines the species, but all approximate the optimum.
If an individual doesn’t quite define perfection, it is either because optimality actually comes in a variety of shapes and sizes, or because forces are conspiring against the person. Debating whether humanity is more closely realized in the form of Colin Powell or Tiger Woods, Jennifer Lopez or Hillary Clinton, we would no doubt agree to disagree on what attributes are desirable in a person. We would, however, likely find common ground when it comes to health, concluding that some not-so-optimal types of genes floating around make us hypersensitive to pollen, push us to eat too much, or make us prone to mental illness. So the question is, why are such bad influences tolerated in the gene pool?
As the book unfolds, we will look closely at six different types of disease, each given its own chapter. It is first necessary to lay the foundation, so I have three goals in this opening chapter: first, to disavow you of any sense that there is such a thing as a “disease gene;” second, to lay out the general theory of complex disease that I enunciate as the book unfolds; and third, to explain how contemporary geneticists go about finding the genes that influence susceptibility to illness.
Telling someone that she has the gene for Parkinson’s or the gene for restless leg syndrome is a bit like telling her that her house has termites or sits on a toxic dump. It implies that her misfortune is that she has something that most people don’t have, and further that all would be well if only she could get rid of the termites or toxins.
Genes are not like that, though. They are not things that some people have, and others do not. Approximately 23,000 genes are in the human genome, and all of us have pretty much the same number, give or take a few dozen. What we actually have are different flavors of genes. The technical term for a gene flavor is allele, pronounced ah-lee-el: Whenever you read the word “allele,” think of chocolate and vanilla ice cream. Alleles are different versions of the same gene, just with different spelling and slightly different function.
In fact, in many cases, when a gene is associated with a disease it is because the gene is in some way broken or missing. Just getting rid of the gene would not help. A better house analogy than termites and toxins might be damp foundations, or cheap window frames. The house is basically the same as everyone else’s, but problems arise because it just wasn’t built as well as it should have been. Generally in such cases, many other things also are likely to go wrong and in this sense, too, the analogy with complex disease is improved.
Similarly, it seems that almost daily we read proclamations that scientists have discovered the gene for stroke or the gene for homosexuality. In almost every case, what they really mean is that the scientists have discovered a particular variant of a gene that slightly increases the likelihood that some people will suffer strokes or prefer their own sex. Sometimes the headlines replace “the” with “a,” which is definitely better but still conveys the impression that the purpose of such genes is to cause the disease or trait. Actually, the genes universally promote what we colloquially refer to as normality. They come in different alleles, and under some conditions particular alleles promote disease, or conditions we choose to label abnormal.
Contemporary genetic research is focused on finding these alleles and is as much about basic understanding of what they do as it is about finding cures for specific diseases. This is because there is little prospect of finding new cures for cancer until we understand why tumor cells grow out of control in the first place, and the next drugs for treatment of depression won’t arrive until we appreciate what is wrong deep inside the brains of the chronically sad. This makes sense if you consider that most of us would prefer that our automobile mechanic understand how the engine works, rather than just try the same old fixes he’s always used in the past.
The advantage that a mechanic has is that humans made cars, so we know not just what every part does but also what its purpose is and how it interacts with all the other parts. Biomedical researchers now have a pretty complete parts list and a fair idea of where each part goes, but there is still much to be learned before we know what all the parts do and how they fit together to make a healthy person.
Much genetic research involves pulling apart and putting back together model organisms that we can manipulate, like mice and rats and zebrafish, and even flies and nematodes. Increasingly the tools are at hand to do it with humans directly—at least, the pulling apart bit. Also, for just about every gene, somewhere in the world there is a person with an allele that does not work, and many thousands of these are responsible for rare syndromes. They are teaching us a lot about how things function, but for the most part don’t explain the common diseases that afflict us all.
To this end, a parallel mode of genetic analysis is much less familiar to most people and yet influences all of our lives on a daily basis to a much greater extent than the genetics that we learn in school. Variously referred to as quantitative genetics, or by phrases such as complex disease, multifactorial trait, or polygenic disorder, it is the study of how common variants in many genes interact with one another and with the environment to produce the biological variation that surrounds us. Genes are fundamentally interactive entities, working together, adjusting to the environment around them, molding organisms but not determining their destiny. For anything the least bit complicated, it truly takes a genome.
Most of the differences between species are of this type, as are the attributes that make us unique, from body shape and facial features to metabolism and even aspects of temperament. So too are the diseases that touch every one of us directly or indirectly as they afflict friends and family: cancer, diabetes, cardiovascular disease, asthma, and depression. The language associated with quantitative genetics switches from the imagery of control, determination, and causation, terms popularly associated with genetics, to the less strident tones of susceptibility, influence, and contribution. This book is predominately about the genetics of complexity.
Perhaps another analogy might make the distinction clearer. All of us are probably painfully aware of the impact that one individual can have on a business. If the CEO, or CFO, or CSO, or Director of IT, or head housekeeper for that matter, stops working or starts making bad decisions, the company can deteriorate rapidly. Yet it is the more subtle failings or distractions of multiple employees that most often disturb the health of the company even in good times. Two co-workers are going through a divorce, a supervisor is having an affair, the junior VP for marketing is caring for her ailing mother, and one of the bookkeepers has repetitive strain injury. Nothing is particularly unusual about any of these circumstances, and each of them is almost to be expected in even a moderate-sized group of people. For the most part organizations can and do deal with them, but mix them together in certain combinations and pretty soon potential dissipates, opportunities are lost, maybe employees start leaving, and things can fall apart. Such is also the fate of our genomes: Genes are ultimately individuals that have to work together, but they’re not perfect, and sometimes the pieces just don’t mesh.
Far from being selfish robots, genes are in fact little molecular existentialists. Contemporary molecular biology is about relationships and networks. It is the context within which a gene is used that defines what it does and what it is. Sure, certain genes are essential for the development of the eye or the heart, but these same genes do other things in different contexts. Think not of genes as dictators, but rather as a parliament of constituents—a parliament that on the whole does a pretty decent job, but sometimes messes up, with dire consequences for the health of the organism.
How Genes Work and Why They Come in Different Flavors
Even if you haven’t asked yourself why it is that genes makes us sick, perhaps you have wondered why it is that your sister has legs up to her ears and piercing blue eyes that haven’t been seen in the family since Great-Aunt Bessie, while you seem to have inherited a horrible mix of dad’s stockiness and mom’s frumpiness? And what’s up with your brother’s moroseness: Where did that come from?
It is not much of an explanation, but the straight answer is that genetics is a lot more complex than the idea that there’s a gene for every trait. Most traits, or attributes, are regulated by many genes, not just one. Furthermore, while it is a nice abstraction to suppose that genes come in normal, or good, versions and mutant, or bad, ones, the reality is that there are always multiple different flavors of normal. The gradation from the most common allele to various types of normal alleles to abnormality is continuous. Just having certain alleles is insufficient to predict whether a person will get a disease.
Crucially, too, the environment has a pervasive effect on the way our genes function. “Environment” means much more than the temperature outside or the nutritional content of the food we eat. It also includes influences as diverse as a mother’s health during pregnancy and the pressure that peers and society put on us to behave in certain ways. As we shall see, in many cases environmental interventions are likely to have a much greater impact on public health than pharmaceutical ones. Unfortunately, most of us find it easier to pop a pill than to buck a social trend, so drugs are likely to have an ever-increasing role in disease control.
Without going into any mechanistic details, it is helpful to recognize that genes function on two levels, the biochemical and the biological. The biochemical is hidden to most observers, and therefore typically excluded from general conversation. The biological is what we actually see.
Each of the 23,000 or so genes scattered along our chromosomes encodes the information to perform a specific biochemical function. Less than a third of these genes function in every cell in your body to provide the basic building blocks and to generate energy—they are the bricks and mortar, if you like. Another third of our genes makes every one of the hundreds of different cell types in your body different. Neurons need proteins that process electrical signals, muscle cells are full of actin and myosin that make them stretch and contract, and white blood cells carry around the components of your immune system. These are the doors and windows and furniture and appliances. The final third of our genes is responsible for regulating which genes are used when and where and in what amount. Turning on hair keratin in your pancreas wouldn’t be good, and light receptors have no place in your heart, so development and physiology are highly regulated processes. These genes are the architects, foremen, and designers.
We hear and read about genes for cancer and for autism, or are given to believe that there is an aggression gene or a blonde hair gene. The reality is that these attributes are many steps removed from the molecular functions that the genes perform. If a gene contributes to cancer, it is because it normally performs a role in making sure that the right number of cells are produced at the right time and place. The reason there may be a genetic contribution to spirituality is not because some genes function to ensure that we have a belief in God, but rather because there are genes that affect how the neurons are wired together and the strength of signaling across synapses.
Fly geneticists like to name genes after the way flies look when the gene is mutated. Antennapedia flies have legs on their heads, technical knock out ones fall over when you bump their heads, and shaven baby embryos don’t have any hairs. It is an amusing, but unfortunate habit, because it reinforces the notion that there are genes for traits. Time after time it turns out that the same gene does completely different things in different contexts. A favorite example of mine is staufen, which is required both for sperm development and for memory. It is not that male flies think with their penises, but rather that both of these attributes turn out to depend on a biochemical process called intracellular RNA localization, which staufen is involved in. Almost without exception the biological functions of genes are not written in the DNA, but rather emerge from the network of biochemical interactions within cells, and in turn the manner in which cells work together to build tissues and organs.
It follows that the reason we are all a little different from one another is because these interactions occur between ever so slightly different copies of the genes. Each gene comes in multiple different flavors—I mean, alleles—that have cropped up during the evolution of the species. These different alleles have their origin in the process of mutation, which is basically what happens to genes when you leave them out in the sun or exposed to poisons.
Mutations are ultimately the source of all things good, but for the most part are harmful, tending to break genes. Every one of us has a few mutations that neither of our parents had, simply because mistakes are made every time the genome is copied. (But don’t get too upset about this: The error rate is only about one in a billion letters in the DNA. Most of us would be thrilled to make a mistake only once in every hundred times we do something.) Mutations are also so plentiful that we all carry several of them that would kill us if we got the same one from both parents.
Mutations are so plentiful in fact that there is no way that natural selection can possibly purge them all. Obviously alleles that would tend to kill a person will not generally last long in the gene pool, and similarly ones that would tend to make us sick should not fare well either. But all new mutations are extremely rare when they appear, and nature has bigger fish to fry. It is more concerned with common alleles that affect the fitness of a large percentage of the population, so the fate of new mutations is largely governed by chance. Consequently, some mutations manage to drift around for a while and can even become reasonably common before they start having a noticeable effect on public health. The process is called mutation-selection-drift balance, which is a fancy way of saying that a lot of bad things happen to genomes, and evolution deals with them, but it is so busy that some of the bad things hang around for a while.
Some mutations are also good for you. Maybe they offer protection from diabetes; maybe they make a person more fertile. These tend to be favored by natural selection, but before they become the standard allele, they necessarily share real estate in the genome with the original allele. Typically it takes thousands of generations for one allele to replace another, so in the meantime you have variation. Sometimes the new allele will be better under some conditions, while the ancestral one is better under others. Maybe they have different effects in men and women, or in rural and urban settings. In such cases, geneticists speak of balanced polymorphisms, the classic case being sickle cell anemia, which is bad under some circumstances but protects a person from malaria in others.
You will also see it argued that many of the bad effects are actually offset by some absolute good that they do. Perhaps at a different stage of life they are sufficiently beneficial that natural selection overlooks their contribution to disease. Or perhaps at some earlier phase of human evolution they were the right gene in the right place at the right time. It is easy to get carried away with devising clever stories along these lines. Some, particularly in the domain of psychology, are even tempted to postulate that promoting disease is in itself advantageous to the selfish genes, but it really stretches credulity to suppose that there is some benefit to having genes that make us suicidal. We won’t go down that road. Rarely is it necessary anyway.
It turns out that as species go, humans are actually among the least variable, at least at the level of their DNA. Nevertheless, the average person has a few million differences between the copy of his genome received from his mother and the copy received from his father. Somewhere among all those differences are the genetic variants that are responsible for all genetic diseases, but no more than a couple dozen have a big enough impact on any particular disease for us to have any hope of finding them. Finding a few dozen out of a few million is a genuine needle-in-a-haystack problem.
Three Reasons Why Genes Might Make Us Sick
The upshot of all this is that there are basically three ways that genetic variation can influence disease susceptibility. These are called the rare alleles, common variant, and small effect models. I will briefly describe each and then in the next section present a unified framework that iterates throughout the remaining chapters.
The simplest model is that a disease can be traced to one badly disrupted gene. This is pretty much the case for cystic fibrosis, and for thousands of other rare conditions. Around 1 in 100 of us carries a mutated version of the CFTR gene without any ill effects, but if two carriers marry, their children have a 1 in 4 chance of getting both bad copies and consequently having cystic fibrosis. The incidence in the general population is only about 1 in 10,000, most of which is due to a few mutations that have been around for centuries, but actually hundreds of other mutations can be found in the gene as well. Whether the disease is so severe that it claims the life of an infant, or mild enough that a person can live to adulthood and maybe receive a life-saving lung transplant, is in part a function of which mutations they have, in part of the rest of their genome, and in part their upbringing.
Single genes can also cause diseases in other ways. Muscular dystrophy is often due to a gene, dystrophin, that is so big that it picks up mutations often enough that most new cases arise in the individual who has the disease. Another small set of genes has an odd feature that makes them mutate at an unusually high rate, leading to the paralysis or ataxia observed in Fragile X syndrome and Huntington’s disease. For the most part, though, single gene diseases are rare.
Large-effect mutations also do not generally explain common diseases, those seen in five percent to ten percent or more of people. Really the only way they could is if there were hundreds of genes that cause a syndrome that we choose to think of as a single disease. Schizophrenia might be in this class, as might the wide spectrum of cardiovascular conditions that lead to heart attacks and stroke. It is possible that these rare mutations interact with one another, so that a person needs two or three of them in any condition to be predisposed to the disease. Unfortunately geneticists have not yet devised a systematic way to discover such mutations.
Currently the most popular model is called the common disease-common variant, or CD-CV, hypothesis. It is the idea that if there are diseases found in ten percent of the population, then there ought to be alleles at about the same frequency that are found in these people, but not in “normal” people. This sounds reasonable enough, so millions and millions of dollars are being spent in pursuit of these alleles, each of which contributes about five percent to ten percent of the risk of illness. So far, Crohn’s disease, an inflammatory bowel syndrome, is the poster child success story, except that it is not actually a common disease. However, ten or so genes have been discovered that contribute to Crohn’s, each with correspondingly common risk alleles. Diabetes and prostate cancer also show signs of following the CD-CV model, but the jury is out on whether this will really be a common explanation for disease.
The third possibility is that hundreds if not thousands of different genes—each with rare or common alleles that have small, barely detectable effects—contribute to each common disease. To some extent this is the default model when all other models fail, but it is beginning to look like it is going to be the predominant explanation. The trouble is that this model doesn’t really explain why diseases are discrete. Height, degree of extraversion, memory performance, and probably most human attributes are thought to be influenced by hundreds of genes, but they show a continuous gradation from short to tall, shy to outrageous, and forgetful to prolific. So why should there be people with disease and people without disease, if hundreds of genes are involved?
A somewhat technical explanation for this is that there is a threshold of liability—in other words, a tipping point from health to sickness whenever you have a little more of something than is normally tolerated. Most people are pretty similar genetically, having average levels of whatever it is. They have some genes that increase the attribute and some that decrease it, but generally not an excess of either. However, inevitably a few outliers will have considerably more of the increasing or decreasing alleles, enough to send them beyond the threshold into the valley of illness.
A Unified Theory of Complex Disease
An added quirk is that there likely are mechanisms that ensure that as few individuals as possible exceed the threshold, even when they have more than their fair share of the risky alleles. This phenomenon is known as canalization. It says that not only do species evolve so that most individuals resemble one another, but they have also evolved buffering that ensures that everyone is “normal” despite the slings and arrows of outrageous fortune that life throws at them.
Next time you trap a mouse, count the number of whiskers: Almost certainly there will be 17 or 18 on each side of the snout. Actually, my dogs also have this number of whiskers, but that may just be coincidence. This number of whiskers is very stable, unless the mouse happens to have a Tabby mutation, in which case on average it will only have a dozen or so whiskers. The catch is that the “or so” can be as few as 7 and as many as 20. Observations such as this are often seen when developmental circumstances are perturbed. Not only does the average appearance change, but it also becomes much more variable.
It seems than that normal buffering mechanisms fall apart when the genetic system is pushed too far away from the optimum. Translated into the realm of disease, the idea is that the modern environment that humans have constructed has taken us out of the buffering zone, and left us more susceptible to perturbations that result in disease. It is, however, much easier to describe what canalization is than the mechanisms that produce it. This is partly because we don’t really understand the mechanisms, and partly because they are usually addressed in mathematical and statistical equations.
The essence of these equations is that stability arises through the deeply interconnected web of interactions among genes. If I were to give you 100 pieces of string and ask you to make a carrying bag, the simplest thing you could do would be to tie them all together at both ends, resulting in a sling. This would be fine for carrying around tennis balls, but somewhat disappointing if you tried to use it to carry your loose change. A slight improvement would be to divide the strings into two groups, and lay two slings perpendicular to one another. If you had time, you could weave the strings into a cross-hatching cloth, and by adding reinforcing strings at different angles you could make this web even stronger. Such a cloth would be able to hold heavy objects that distort it and to absorb breaks in a few of the strings.
Genetic networks are similarly structured as interacting linkages that together form a tighter, more coherent whole than would be produced simply by adding together bits and pieces. But the whole inevitably has holes, particularly when stressed, and these holes lead to disease.
Now think about some recipe you used to love to make as a child. Let’s say your favorite ham and cheese omelet, or if you were unusually adept in the kitchen, a soufflé. When you were a child, you probably stuck pretty close to the recipe, knowing that so long as you balanced the amount of ham and cheese you added, the omelet would turn out nicely. Then you went away to college and went through a phase of not eating breakfast or stopping for a McBiscuit on the way to work, and now you’ve forgotten the exact recipe. You think you have it right, but every other time you make one, the kids get a pained look on their faces and spit it out. There’s probably something wrong with the number of eggs you are using or the amount of milk. Or maybe it is because you are using an electric stove instead of gas, or the eggs where you live now are a different size than those where you grew up. It’s frustrating, but you just can’t recapture the magic of the old combination.
In this metaphor for the origins of complex disease, the recipe stands for the genetic program for healthy development, growing up and changing the recipe stands for genetic evolution, and switching cooktops stands for environmental change. The key is that tens of millions of years of genetic evolution devised canalized systems for regulating the amount of glucose in our blood; the balance of immune response to bacteria, viruses, and parasites; and the way the chemicals signal in the brain. These systems were well able to absorb normal fluctuations, without exposing too many individuals to disease. But humans are an incredibly young and rapidly evolved species, and we have completely changed our environment in the past century. This pushes us—as well as many of our domesticated companion animals that get similar diseases—out of the buffered zone, exposing genetic variation that may never have had an effect in the past.
So while it is convenient to assume that humans are close to some optimal design, we have not actually been around for long enough to allow the genome to make fine adjustments that ensure that most people are buffered from disease. Humans are without a doubt a long way from any such equilibrium. We shared a common ancestor with chimpanzees just five million years ago, and with Homo erectus cavemen just a million years ago. As a species, Homo sapiens has been in existence for just 140,000 years, somewhere around just 10,000 generations. The flies sitting on the fruit salad at your barbecue have likely been around as a species for 100 times as many generations.
Perhaps it wouldn’t matter so much, except that we’re also a really, really different species in so many ways. We’re just beginning to explore our novel world. From the Arctic to the Antilles, and from Newfoundland to New York, humans are re-creating their niche, putting pressure on the gene pool to deal with all kinds of extremes. We live longer than our close ancestors, consume strange diets, walk upright with a funny pelvis, have babies with big heads, share our homes with a menagerie of animals, and cope with really complex social settings. If you feel stressed at times, imagine things from the perspective of the genes that helped us get here.
The point is that recent human evolution has required substantial changes in our genetic makeup, disrupting genetic relationships that had evolved over millions of years. These changes have left us exposed. Like an adolescent still growing up and trying to come to terms with a constantly changing world, we’re just a little uncomfortable with who we are. Presumably we’ll get to a more comfortable genetic place, but not for a few more hundred thousand generations.
The Human Genome Project
Let’s turn now to the issue of how geneticists study the origins of disease, beginning with something called the Human Genome Project. This is an effort to identify and describe the function of every one of the genes in the human genome, particularly those related to disease. Early on, there were some naïve expectations that just by sequencing a genome, the genes would be obvious and within a few years we would have cures for all the major maladies that afflict citizens of the developed world.
It hasn’t turned out that way, for good reasons, but the technical accomplishments have exceeded expectations, and it is doubtful that anyone foresaw the direction that genome science would take. The first announcement of a draft human genome sequence was greeted by President Bill Clinton as a step toward a closer understanding of God’s design. Less spiritual observers saw it as a step toward diagnostics and interventions for hundreds of diseases. Cynics saw it as yet another example of scientists’ hubris in throwing hundreds of millions of dollars at a problem without solving anything. My sense is that, like man’s walking on the moon, it is an achievement that serves as an identifiable landmark in the emergence of a new domain of human endeavor, but will eventually be seen as just another small step along the human journey of self-perception.
There were actually two genomes sequenced—one by an international consortium that was financed by public money, and the other by a commercial enterprise known as Celera Genomics. It is legitimate to ask why hundreds of millions of taxpayers’ dollars were spent on a project that turned out to be doable by private initiative. There are many answers to this question. One is that Celera might never have started without the incentive provided by the public effort (and similarly, the public effort would not have finished so quickly without being pushed by Celera). Another is that there were legitimate reasons to believe that the strategy adopted by Celera would not work, whereas the public approach was guaranteed both to work and to provide useful information as it progressed.
The two projects took what we might refer to as MapQuest and Google Earth strategies toward sequencing the human genome. Suppose that you are asked to come up with a brand new atlas of the United States, complete with street names and house numbers. Most of us would probably start by employing someone in each state and charging them with the task of mapping out the major cities and highways. Lake Tahoe and Fresno would be placed on the atlas as the cartographers radiated out from Los Angeles and San Francisco, eventually linking up with Reno and Las Vegas. The approach would be painstaking and slow, but for most intents and purposes, guaranteed to be accurate, and the drafts could be used even before the final version was available. This is what the public effort did: Each chromosome was assigned to a major sequencing center, and the consortium put the pieces together over a period of five years.
By contrast, the maverick visionary behind Celera, J. Craig Venter, decided to do the equivalent of renting a satellite to take hundreds of millions of photographs. Every piece of land would be present on at least ten of the photographs, and a massive supercomputer was programmed to find the bits of similarity at the edges, assembling the complete atlas simultaneously based on overlaps between adjacent photographs. The process was fast and relatively cheap, but you might imagine that all those Main Streets and repetitive cornfields in the Midwest would confuse the alignment of photographs, making for some odd distance estimates, and that bits of New York might turn up in the middle of Philadelphia by accident. However, the Celera people were clever enough to devise ways around these problems, and their atlas of the human genome turned out to be just fine. It also turned out to be Craig Venter’s own genome!
Bear in mind that an atlas is just a set of guides: It doesn’t tell you where steel is manufactured, where cotton is grown, or who lives at 286 Magnolia Lane. For that we need classical genetics, bioinformatics, and molecular biology. The biggest emphasis right now, though, is on comprehending the variation at each of the positions in the atlas. This is the quest to find the tens of millions of places in the genome where we all differ, and to work out which few thousand of these differences are associated with disease and behavioral, physiological, and physical variation.
We don’t need to get into the gory details of how the genetic code can possibly hold the secret to life; suffice it to say that it consists of four letters, A, T, G, and C, strung together in long molecules of DNA. Each human gene is made of something like 10,000 of these letters, and there are around 1,000 genes per chromosome. The sequence of these letters specifies the nature and function of each gene; sequences can vary among individuals in three basic ways. Single nucleotide polymorphisms (SNPs) are positions in the genome where two or more different letters might be found if you compare two people. Indels are insertions and deletions, usually of just one or a few letters. Thus, comparing the sequence AATGCGCA with AGTGCGCCA, it appears that there is an A/G SNP at the second position, and an insertion of an extra C two bases from the end of the second sequence. The third class is copy number variation (CNV), which is much larger insertions and deletions of thousands of bases at a time.
Ultimately, change in the sequence of letters translates into susceptibility to disease or blue eyes or a cheery disposition. Remarkable as it may seem, a person who has an A instead of a G at position 102,221,163 on chromosome 11, may be born with a mild heart defect. This insight leads to the idea that if we could sequence a person’s genome, we could maybe work out what diseases that person may be likely to get. Venter has written a book that does just this for himself, but it must be stated that our ability to interpret sequence differences is primitive, and like a Shakespearian play or a T. S. Eliot poem, the genetic words will always be subject to interpretation.
So what has the sequencing of the human genome achieved, and when should we expect to see some impact on medical care? The achievement is that we now have the solid foundation upon which twenty-first century biomedicine will be erected. To see this, think about the analogy of the genome sequence as a road atlas once more. Prior to its completion, molecular biologists were simply working with the obvious features in the genomic landscape, or laboring painstakingly to find where they were headed. Now they know exactly where the residential areas are, where to find businesses or manufacturing sectors, and what type of agriculture is carried out where. They have the street names and addresses for most families and can readily find the government regulators.
To identify what is going wrong when a genetic disease occurs, though, they need to be able to peer inside the houses and offices. Sometimes the cause of a problem is obvious, as if the roof were missing. More generally, though, subtle problems get in the way. The most interesting cases don’t involve a single mutation, but rather the accumulated effects of many regular, everyday variants in the genome. These variants are behind diabetes, asthma, and depression, and they take advanced statistical and computational procedures to find.
Until the middle of 2006, the search for new genes that influence disease was pretty much restricted to studies of extended families. Typically, geneticists would identify pedigrees in which a particular type of cancer or heart disease was unusually common and look for parts of the genome that affected individuals have in common. This approach, called linkage mapping, has been the main method for finding single gene disorders, but has had limited success for more complex diseases.
Your parents have between them four copies of every gene. You have two of these, and your children each have a 50-50 chance of receiving each one. Suppose now that you, your father, and two of your three kids have a heart murmur, and both of these kids received the same allele from you, which is also the one you got from your Dad, while their sibling received the other allele. So, four out of seven members of the family have the murmur, each of whom has the same allele of the gene. Something fishy is going on, and you would likely conclude that the level of correspondence between having the allele and having the disease is unlikely to be due just to coincidence. You would suspect that the allele actually causes the murmur.
However, since there are thousands of genes and millions of families with murmurs, that level of coincidence is bound to occur occasionally. But if geneticists find a similar correspondence in dozens of even bigger pedigrees, their confidence that the particular allele of the gene actually causes or at least contributes to the murmur increases. With enough data, the correlation between the gene and the disease does not have to be 100 percent. As a result, it is also possible to detect linkage between regions of the genome and complex diseases where each gene only has a small influence on the disease. On this basis we know, for example, that a dozen or so places in the genome influence type 2 diabetes. For reasons we needn’t concern ourselves with here, those places typically stretch over perhaps a tenth of a chromosome, or hundreds of genes. So they do not pinpoint the problem.
To get around this, the field has now turned to a revolutionary approach called genomewide association mapping, or GWA. Instead of looking in families, geneticists now look at unrelated individuals drawn from an entire population. Two companies, Affymetrix and Illumina, have manufactured little gene chips with up to a million common genetic differences printed on them. These markers stand as proxies for the tens of millions of places in the genome that are different among people. For less than $1,000 a pop, geneticists can now effectively measure what a person’s genetic constitution is, almost as if they were determining the sequence of the person’s entire genome.
For a few million dollars geneticists can go out and compare the genomes of 10,000 people who have a disease, with the genomes of 10,000 people who do not have it. If the frequency of the A at position 102,221,163 on chromosome 11 is 29 percent in people with a heart defect, but only 19 percent in people without the defect, then after appropriate crunching of the numbers we can infer that this site is contributing to the problem. This is a gross oversimplification; all sorts of possible alternative explanations can be made for such a difference. But if another group replicates the result in an independent sample (often from another country), then confidence that the gene is involved in the disease shoots up still higher.
It turns out that this approach is a sufficiently fine genetic scalpel that it actually leads us to one or a few genes involved in the disease. Genomewide association scans for disease will be to human genetics what the microscope was to nineteenth century biology, and what they are telling us is rightly the subject of the remainder of this book.