Big Data & the Corporate Mandate

By Nataraj Dasgupta, Senior Information Technology Lead, Data Sciences, Systems Development & Analytics, Purdue Pharma L.P.

Nataraj Dasgupta, Senior Information Technology Lead, Data Sciences, Systems Development & Analytics, Purdue Pharma L.P.

Big Data is a term that is unusually prone to selective interpretation. If someone tells you that they have Big Data, they are always right. It is not unlike weather predictions, which, like their belief in Big Data, is right one way or the other. If the weatherman predicts that there is a 90 percent chance of rain, he is right if it does, and he is right if it doesn’t since he did implicitly suggest that there was a 10 percent chance it would not rain.

“For better or for worse, Big Data is here to stay”

For those in financial services working with stock or pricing data from the exchanges, a few Petabytes could be considered “Big” (Data). For someone working on Excel and attempting to open a 5 GB file with a few million rows, the file isBig Data. In essence, any data that takes effort to perform simple operations with, whether it is a collection of MP3 songs that takes an eternity to copy over to a new laptop or the collection of oversized jpeg images that makes your computer painstakingly slow, is, in relative terms Big. It is Big enough for you to long for ways to make it easier to work with.

The elusive nature of the terms poses a dilemma usually reserved for metaphysical literary terms, much less for terms in computer science where precision is a venerable and fundamental trait of the discipline.

For better or for worse, Big Data is here to stay.It has been largely driven by the exponential growth in websites and web based applications over the past 4-5 years that have commanded surprising numbers of loyal followers, Twitters, Googlers and avid Instagram users capturing every moment of their lives, generating billions of data points along the way. In the time it takes to read this article (575 Words per minute), Twitter users would have posted 1 Million Tweets, users worldwide would have viewed 19 Million Youtube Videos, 250 Million Emails would have been exchanged.

As a consequence, attempting to leverage this vast wealth of information  to add value to the corporate workings of an organization has become a standard mandate at companies across the world.Sales departments at companies that have sprung up in the wake of the Big Data phenomenon relentlessly preach the power of using the information to derive insights through machine learning—a more sophisticated technique for analyzing information. And, as a manager entrusted with bringing about change and making your mark within the organization, it is not uncommon to find such propositions attractive; it is a world that promises instant gratification and will perhaps propel your career while delivering value to the company. Indeed, it is no longer a question of whether you should invest, but rather how soon you can invest in Big Data related technology.

There is however, a strange, although not unexpected, ambivalence with such an exercise. The Big Data phenomenon that we are witnessing is primarily driven by data that is produced by virtue of the frenzied activity and proliferation of social networks, mobile devices andin general connected devices through better access to internet in remote parts of the world, faster bandwidth and web-based applications whichhave gained vast number of users in a short span of time. On the other hand, thevolume of data that is generally not linked to data gathered from social channels has had a much more modest rate of growth.

For a real-world example, consider the case of physician data that is distributed by many well-known pharmaceutical industry data vendors. The datasets may contain a diverse range of information regarding physicians, medications, frequency of prescriptions, and other related aspects. Pharmaceutical companies commonly use these datasets to measure market reach, discover new ways to improve healthcare standards, and perform various other statistical analysis. While the data may be supplemented with additional metrics, the volume of this data has remained by and large constant, immune to the explosive growth in internet-generated information. For such datasets to grow at even a fraction of the rate at which social media related data grows, either the patient population has to grow at an similar rate or the number of physicians would have to triple within an year or less. Other industries, such as finance have seen faster rates of growth in data volume, and yet its pace of growth pales in comparison to the atypical growth in the internet segment.

All of this leads to a simple realization—if the data you work with is in general static, do you truly have “Big Data.” And, would the corporate mandate to make Big Data part of your corporate strategy, truly deliver tangible value that would have a meaningful effect proportional to the investment, in terms of revenue or otherwise.

In practice, in a majority of the cases, especially in small-mid scale firms that do not depend on social networks, web-related data or sensor data, while a Big Data mandate sounds fashionable, its direct impact on revenues is much less pronounced and inevitably leads to disappointing outcomes. To expect that Big Data alone can, in a short span of time, reveal significantinsights in your data that a team of experienced market research professionals in the organizationhas not been able to discernis, statistically speaking,a rare case on the extreme end of the bell curve.

However, all is not lost. The growth of data has led to various prominent technological breakthroughs that permit analysis on both small and big data at a scale that conventional systems cannot compete with. Commonly known as NoSQL databases, new paradigms in database architecture allow us, for the first time, to perform complex calculations at a very rapid pace. All types of data, both big and small can benefit from using such technologies that enable rapid analysis, thus shortening the time to produce results and in effect perform iterative analysis much faster.

In our experience, we have found that although the need for speed might not be a major differentiator in and of itself, the cumulative effect of being able to rapidly prototype, obtain results, take decisions based on the results and if needed, re-run such simulations can materially improve overall department productivity. Instead of having to wait for an hour or longer for a certain analysis to complete using conventional tools, we can instead get the same results within a span of a few minutes, decide if more analysis is required and run further simulations as needed. The shortened time to obtain feedback hence has a substantial impact that while somewhat intangible, gets recognized readily as researchers, epidemiologists; R&D staff is able to proactively complete tasks in a significantly shorter time frame relative to prior existing standards.A Big Data strategy driven with realistic and practical expectations of outcome, leveraging technologies that may show immediate results as a first step can be very successful and clear the way for further exploration and investment that might indeed reveal the hidden insights in your data.