Building a Data Factory for Speed and Innovation in Life Sciences

By Ramin Daron, VP, Data Architecture and Technology, R&D Data Science Institute at Takeda Pharmaceuticals International Co

Ramin Daron, VP, Data Architecture and Technology, R&D Data Science Institute at Takeda Pharmaceuticals International Co

I once asked a colleague in finance, “what is the depreciation rate of knowledge capital?”  A bit tongue in cheek, but a question I still pose to myself today.  While the answer is nuanced and complex, it is a good reminder to understand the value of data and information gathered and used within an organization along with an appreciation that passing of time has an impact on same.

In today’s world, it is empirically evident that the rate of change itself is increasing, not just in technological advance, but also in the impact of those advances to business models.  In this environment it is more critical than before to enable your organization with speed and ability to learn, grow and evolve at a pace equal to or greater than changes.

The focus of research and development in pharmaceutical industry has been on data and ability to exploit the data to gain information and convert to knowledge in most efficient manner.  When the mission is to improve health, urgency of outcomes in life sciences, good or bad is highest priority. While research and development is primarily a trial and error endeavor, many have taken up the philosophy of fail fast and recover fast. 

In practical terms, the connection between the information and its intended consumer needs to be timely, relevant and enriched.  It needs to continuously strive beyond accessing information to providing answers to questions.

Enabling Adaptation, an evolutionary process

Along this journey, we set out to build the data factory in a manner fit for purpose concentrating on three key areas:  content, context and connections. Knowledge management doesn’t stop at knowing what we know but striving to know what we don’t know.

Outcomes in mind - what is the question we’re trying to answer – we can attempt to articulate the value proposition of the answer and then design the path for fulfillment of the query.  Questions can arise from multiple areas and needs with individual preferences.  Executives seeking state of portfolio, supervisors and managers tracking progress and productivity of activities, scientist researching novel methods, patients understanding treatment options all need individualized fulfillment of queries and the underlying content, preparation and presentation.  In order to maintain the desired speed and agility to accommodate the variations, the mindset needs to shift towards services.

Technological advances and service offerings along with consumerization of technology has increased the expectation that we can connect with information in near time if not real time.  Searching for answers has given way to alerts and updates per individual needs.  Individualization of information requests through rules-based algorithms and AI are allowing for deeper connections to knowledge artifacts.  Access to information is becoming more commonly provided through smart devices and in our terms; what, how and when we want it.  Technology has further enabled profiling of consumers through machine learning of habitual activities for dynamic updating of content delivery.  Use of no code and low code platforms has created means for accelerated development of transactions to capture additional data.  Robotic Process Automation is being used to streamline application and execution of business rules and filters.

With the growing emphasis in data science, life science organizations are looking to take a big leap though the utility of advanced analytics, access to large volumes of data, and facility of platforms"

In this highly complex and multi-disciplinary industry, increasing collaborations with external organizations through partnerships and consortiums have added to the need for a framework of speed and agility to deal with flow of data and information.

Keeping in mind FAIR (Findable, Accessible, Interoperable, Reusable) principles, data for purpose of aggregation and analytics and insights needs to be accessible and complete.  Under a policy of openness and transparency, pooling data into a data lake or central repository (physical or virtual) expedites breaking down silos. Automation of data feeds and use of streaming services has increased the variety of data and its fluidity. Accounting for diverse data set and chain of custody through cataloguing and tracing lineage allows for change and reusability. Aggregation, visualization, and interpretation of data and information artifacts support the presentation of information and knowledge

To make sense of all this data, standardization of formats and data models have long been used for organization of data and preparation for integrations.  To ensure relevance and integrity, adequate contextual information needs to be applied to the data sets. Curation of data sets is still a highly manual process although AI has been growing in use to better categorize, catalogue and infer contextual meaning. With improved enrichment and preparation of data, we gain efficiency in better understanding patients and diseases and outcomes of therapeutic applications.

Making the Big Leap, a revolutionary process

A lot has been written regarding the promise and challenges of big data and artificial intelligence in realization of personalized medicine.

With the growing emphasis in data science, life science organizations are looking to take a big leap though the utility of advanced analytics, access to large volumes of data, and facility of platforms. Diagnosis of conditions in real time based on genetic, medical history, environmental exposure, mental and physical activities, nutrition, etc to be followed by disease management with potential for cure and early preventative measures are the aspirations.

A daunting task is being able to go from chemical structure or biological entities to a therapy with confidence and precision.  However, a framework to enlist all the contributing factors (in the 10s of thousands) to outcomes with relations to data sets, algorithms and outputs can prove to be a start in not only defining the path, with incremental gains along the way but also aiding in identifying gaps in data or current thinking. Continuing progressive enhancement of the data factory, we apply these learnings in pursuit of better health.

Read Also

Data Security in Healthcare

Data Security in Healthcare

Mark Eggleston, VP, Chief Information Security and Privacy Officer, Health Partners Plans
The Biotech IT PMO 2.0

The Biotech IT PMO 2.0

Paul Ritchie, Executive Director of Information Technology,Eppendorf
Plant Biotechnology Success Requires Collaboration, Proactive Strategy

Plant Biotechnology Success Requires Collaboration, Proactive Strategy

Tim Hassinger, President and CEO, Dow AgroSciences

New Editions