The genome is not enough

Genome sequencing is now a routine part of biological research — but understanding how life works requires more than countless strings of letters.

Knowing the underlying genetic information at the heart of biology is a necessary, but not sufficient, step towards deciphering the mysteries of living things. Steve McFarland/Flickr (CC BY-NC 2.0) — *Knowing the underlying genetic information at the heart of biology is a necessary, but not sufficient, step towards deciphering the mysteries of living things.* Steve McFarland/Flickr (CC BY-NC 2.0)

This is an editorial for Issue 6 by Lateral editor-in-chief Jack Scanlan. Don't worry: he loves genomics, and will be commencing a PhD in insect genetics in a couple of months.

It has become cliché to say that genomics has revolutionised biology, but it is hardly an exaggeration. What scientists in the 60s and 70s thought was almost mystically unattainable — the DNA sequence of every gene that a particular organism possesses — is now routinely accessible through methods that are sharply falling in cost and used in laboratories the world over. Thousands of species have had their genomes uncovered through comprehensive sequencing, with the number swiftly rising every day. Analyses in medicine and ecology previously not possible are now easy to perform. We’re drowning in genomes.

This should be a wonderful thing — and it is. But a change in focus from small-scale research to large genome-based projects is biasing what many of us now see as successful or worthwhile science. And as much as genomics adds to biology, relying on it misses vitally important avenues to find out about how life works. So most of the time, sequencing a genome is not enough.

A lot of the initial success of genome projects has relied on many decades of fundamental research in molecular biology, biochemistry, and classical genetics, the discipline some scientists (wrongly) think the shinier, high-tech field of genomics will inevitably replace. Genome sequencing produces large quantities of raw data that must be interpreted in a biological context, and most of this interpretation relies on forming connections between these new data and the research community’s background knowledge of what certain genes are likely to do.

Once you find a gene in a genome, you can predict what protein it produces. The specific sequence of that protein can yield clues as to what it does — but only if it is similar to another protein with a known function. How was that function discovered? Maybe it was through classical genetics, wherein a mutant animal was found and the gene causing its defect pinpointed with a time-intensive breeding experiment. Or maybe it was through molecular biology, wherein the gene was transplanted in bacteria and its encoded protein produced in such large quantities that it could be manipulated in the lab directly. Whatever the technique, it was likely a long, tough process, and it was all focused on one gene.

All that time looking at one gene, while genomics is pumping out thousands of them. Ouch.

The process of assigning functions to genes in the context of the genome is called, rather straightforwardly, functional genomics. Because of the number of genes in any particular genome can number in the tens of thousands, it is impractical to use human scientists to comb through the data to put a function to every single one of them. Automated software programs use generalised information from our cumulative knowledge of biochemistry and molecular biology to predict protein functions based on similarity to computer models. It can work well, even if sometimes the predictions are a little too vague to be useful.

But relying on computer software to characterise data has some significant problems. Even the most sophisticated program will still miss certain things if it’s not told to look for them, but the converse is typically true as well — many pieces of genomics software produce a lot of noise along with the all-important functional signal researchers are interested in. This is a problem common to a lot of so-called 'Big Data' projects in biology, many of which rely on genomic information.

A great example of noise partially ruining perfectly good research came in 2012 with the widely-reported publication of the ENCODE (Encyclopedia of DNA Elements) project — an attempt to assign functions to as much of the human genome as possible, even the functionless junk DNA that fills much of the space between genes. Using data from other groundbreaking '-omics' techniques without proper interpretation, ENCODE researchers wrongly claimed that up to 80% of our genome has a function, a figure at odds with established principles in evolutionary biology and biochemistry.

More recently, a genome sequence for the near-indestructible tardigrade was published, brimming with genes stolen from bacteria. The researchers claimed this explained why the creature is so hard to kill, and like ENCODE, the story was reported widely in the media. But their bubble was burst when another group of scientists demonstrated that these genes were most likely from bacterial contamination and were not truly in the tardigrade's genome at all. In this case, the genome alone was enough to cause fanfare, even when it hadn't been properly put together, let alone verified through other techniques.

To be truly valuable to science, genomics data require valid interpretations, based on knowledge gained from less flashy methods. These methods take time, patience and aren't sexy, and so are rarely mentioned in the media. Once a genome has been uncovered, these are the tools scientists use to piece together how it works. They're not an optional step, and we need to pay them more attention; they deserve their due.

This isn’t all to say that genomics is in any way useless. Of course not. Genomics has breathed new life into studying and understanding taxonomy, biodiversity, microbial communities and historical anthropology, all without requiring much background knowledge of the functions of the genes it spits out.

But the most powerful application of genomics may be that in conjunction with old-school techniques and their modern cousins. Where past generations of scientists spent their whole lives studying single genes somewhat blindly, we can now choose our targets with genomic foresight. Areas like cancer biology and medical genetics are unfathomably improved with genomic information in the mix — many future cancer treatments may rely on sequencing individual tumour genomes, while patient genomes can allow doctors to personalise drug therapies to avoid side effects and complications.

Personally, I use genomics all the time in my research, which focuses on a group of genes in insects. In fact, without genome sequencing, I probably wouldn't even know that these genes exist. But it's a starting point, a launchpad for further work. Work that I should probably get back to, now that I mention it...

By the way, there's a lot more about data in this issue of Lateral: genomics isn't the only culprit when it comes to generating huge quantities of the stuff.