Successful drug discovery and development relies upon access to accurate, trusted and reliable data at every stage of the research process. This is particularly pertinent in the early, pre-clinical stages – where data is at the very crux of R&D – as researchers seek to better understand the biology of diseases. However, while access to data is crucial, the volume of data now available to researchers can actually hamper productivity rather than support efficient research that reduces the likelihood of late stage failures.
Indeed, over 1 million new articles are published every year and more than 25 million references are added into PubMed. As a result, finding relevant pieces of information hidden in the big data haystack of scientific literature is overwhelming researchers. Further, effective literature search can potentially take many months given its complexity. Ultimately, the ability of researchers to harness the data and information available to them is the key variable in the overall success of their research.
Realizing the Potential of Big Data
Big data has enormous potential in scientific research and the creation of new, successful treatments. Through data, researchers can improve understanding of a disease’s structure by determining the complex molecular relationships that underlie them. With this kind of knowledge, scientists can more accurately predict how a patient will respond to a drug. Precision medicine is another area where effectively harnessing big data is crucial; with access to data on patients’ genetic make-up and co-existing conditions, researchers can seek to create more effective, tailored treatments. However, for big data to really drive precision medicine and fulfill its great promise, scientists must be able to filter through the ‘noise’ they are presented with.
Market challenges have also driven a change in the way that the tools used in scientific research are designed and applied. Spiralling research and development costs, and increasingly competitive markets, have increased the pressure on pharmaceutical companies to improve R&D productivity. As a result, research tools are becoming particularly astute at screening, aggregating and integrating large data sets to successfully uncover unique insights. Added to the improving technology on offer, the ease and relative low cost of cloud computing and supercomputing has lowered barriers to access and reduced the time it takes to crunch many terabytes of data. The life sciences space is learning how to better manage complex and diverse data, as the sheer volume and density of published scientific research continues to grow.
Life science companies must invest more in solutions that support the early and preclinical phases of the drug discovery process to help researchers better understand the biology of diseases, and correct selection of experimental models, to ensure successful R&D that reduces the likelihood of late stage failures. Only through this approach, and with a focus on early research, will pharmaceutical companies be able to increase both their productivity and their R&D return on investment.
Mitigating Risk through Data
Drug discovery and development is a high-risk, high stakes enterprise, with no guarantee of success. Today, it takes an average of 17 years to bring a new drug to market, and costs up to $5 billion. Because of the high costs and time involved, mitigating risk in R&D is crucial. In life sciences research, the ‘fail early and fail cheap’ principle is therefore followed closely, coupled with the concept of backing therapies where there is a large gap between needed standard of care and the actual current standard. To facilitate this approach, access to many forms of data from a multitude of sources, as early as possible, is vital to mitigate risk throughout the R&D process. Researchers need access to published research results and data from varied fields, such as chemistry and biology, to provide them with the necessary foundation to make confident, better informed decisions about the development of a drug.
In addition to published research, there are less obvious data sources that can also prove extremely valuable to researchers. Take the data that pharmacovigilance (PV) and drug safety teams’ hold, for example, which can also be used effectively in drug development. The perception of PV and drug safety teams as typically bearers of bad news is at odds with the actual benefits that PV data can deliver to drug researchers. For instance, PV data can help scientists to avoid any mistakes already made in previous studies or to identify potential adverse reactions before a drug goes to clinical trial, or improve care by allowing better labelling of possible issues arising from co-meds and co-occurring conditions. In turn, this data allows for pharma to spot new opportunities and gaps in patient care in specific subsets of patients.
In short, it is crucial that researchers can access and make use of data from multiple sources – to ensure they are covering all bases and are achieving the most value from the large amount of data available to them, maximizing the success and efficiency of their study. Being able to crunch huge data sets is ineffectual without scientific knowledge applied to it, to identify the value.
Balancing Science and Technology
The structure, quality, actionability and standardization of information are all issues with big data. Furthermore, over 85 percent of medical data is unstructured, yet still clinically relevant. Because of this, handling immense data sets from various sources requires a combination of scientific and technical expertise. Organizations have had to seek out solutions that allow data to be investigated in a standard, repeatable and structured manner. The amount of unstructured data being generated is also increasing significantly, and will continue to do so, with the pervasiveness of social media, wearable devices and the increased use of devices connected to the Internet of Things (IoT).
In science and research, text-mining tools and the use of relevant vocabularies and taxonomies are essential. Without a scientific understanding of how data is categorized, technology solutions by themselves are not effective. Indeed, without taxonomies, the only way to find comparable data points is to compute the distance of this point to every other point in the space, which is a huge number of computations. The most efficient way to discover relevant content and extract key data from multiple and disparate data sources, is to use taxonomies combined with semantic technology and text-mining tools. The resulting nuggets of data can help researchers make associations where none existed before. Ultimately, it’s a careful balance of scientific knowledge and technological know-how that will help researchers become more productive and successful.
Objective of Data-driven Decision Making
The power underlying big data is a principle which applies across all industries – in its ability to unlock the intrinsic value and great potential of data-driven decision-making. Across all fields, as the amount of information we produce proliferates, confident decisions backed by accurate data are the goal. In pharmaceutical research and development, data-driven decision-making translates into the ability to mine and analyze vastly different literature and data sources, helping researchers to make relevant associations between genes and proteins in the search for new drugs and treatments.
With such a wealth of data available to organizations today, it’s possible that answers to important, long sought queries could finally be reached within a fraction of the time – both limiting the amount of late stage failures and improving R&D productivity. Researchers should not be daunted by the deluge of data at their fingertips, but rather embrace the benefits it brings.