When scientists first sequenced the human genome nearly a decade ago, it was hailed as an achievement that would transform biology and the way scientists tackle new problems. While it has yet to fulfill the prophecy of revolutionizing the diagnosis and treatment of most, if not all, human diseases, it has helped scientists better understand the health and medical needs of people based on their individual genetic blueprint. It has also spurred the growth of bioinformatics and systems biology, and led to the creation of vast information sets that have come to be known as “Big Data.”
Big Data refers to our ability to collect and analyze the massive amounts of data we now generate. “The goal for medical research,” says Ranjan Perera, Ph.D., scientific director of Analytical Genomics and Bioinformatics at Sanford-Burnham Medical Research Institute at Lake Nona, “is to use that information to take a macroscopic view of health, including the ability to recognize patterns or clues to disease genesis and development.”
What does this mean for science? What are the new technologies that are creating such massive data, and how will the data advance medical research and lead to the development of novel therapeutics? At Sanford-Burnham, researchers use a variety of scientific technology platforms—called the “omics”—that are generating expansive amounts of data. The most commonly studied platforms are:
- Genomics – the study of genes and their function
- Proteomics – the study of an organism’s complete complement of proteins
- Metabolomics – the study of the relative differences between biological samples based on their metabolite profile
- Lipidomics – the study of global lipid profiling in an organism
The numbers behind these fields are staggering: approximately six billion base pairs (the building blocks of the DNA double helix; C-G and A-T) in the genome, approximately 20,300 protein-coding genes, thousands of RNA molecules, and at least 2,900 metabolites. And that’s just for one person! Rather than examining data from just one of these platforms like before, scientists now look at multiple platforms, producing the sort of data that only massive supercomputers can handle.
The ability of scientists to analyze this data means a better understanding of disease processes and potentially more solutions for treating myriad conditions. The data will be useful for diagnostics, especially early cancer detection. Profiling a person’s proteome or metabolome can help clinicians see global changes in the body far in advance of symptoms. Big Data will also have a major impact on personalized medicine. Until recently, disease treatment was often a one-size-fits-all approach. “As we learn more about how our genes drive response to treatment, therapies can be tailored to an individual’s disease based on his or her genetic profile,” says Adam Godzik, Ph.D., director of Bioinformatics and Systems Biology at Sanford-Burnham. Because everyone’s disease is different, understanding what happens on a molecular level can determine the most appropriate treatment for a given patient.
In his laboratory, Sumit Chanda, Ph.D., of Sanford-Burnham’s Infectious and Inflammatory Disease Center, studies cellular proteins involved in influenza A and retrovirus/HIV infection. He uses a series of systems-level approaches that produce huge amounts of data, to understand the molecular strategies adapted by these viruses as countermeasures to innate immune responses.
While it has become an indispensable tool for researchers, Big Data carries challenges for the scientific community. Collecting and storing data are easier tasks than figuring out what information is important and, more critically, how to use that information. “You have to sift through the data and figure out what’s disease-causing and what’s normal background variation, or noise, for large-scale datasets,” says Chanda.
Sanford-Burnham scientists and IT professionals have been facing Big Data questions for years, says IT director Eric Hicks. With its 10.4 petabyte capability, the Institute’s storage system can “grow with us,” allowing scientists to sift through an ever-growing trove of data. In addition, a tape library, with even higher scalability but lower costs, is replicated bi-coastally (in California and Florida) every night, so there is no exposure to tape failure. File sharing on National LambdaRail, a high-speed computer network that links the U.S. research and education communities, connects scientists at Sanford-Burnham’s two primary research facilities, with speeds about 1,000 times faster than a home broadband connection.
As forecast, Big Data is transforming the life sciences. It provides not only deeper and broader insight into human biology, it also helps scientists with the most practical applications of their research—understanding disease processes and developing new treatments for human diseases.