The 12 Step Method for Stronger Data Analysis

Data Science
Lennart Bootsman
Publication Date
21 October 2021

The 12 Step Method for Stronger Data Analysis

In order for healthcare data to better meet the information needs of decision makers and key stakeholders within healthcare organizations, it is imperative to employ a systemized method for aggregating, analyzing and re-analyzing traditional clinical data, as well as the wide range of data made available by technological innovation (e.g, wearables, sensors, smartphones).

At Mobiquity, we’ve developed and tested a 12-step method for stronger data analysis. Following the cyclical nature of scientific inquiry, this approach will help you sift through data analytics to identify actionable insights, answers to critical questions and, most importantly, help uncover the next round of unknowns that will guide and enhance future analyses.

12 Step Method for Stronger Data Analysis

Once you’ve collected data, you’ll need to organize and analyze it in a way that allows you to derive and apply meaning. Here is our recommended approach:

1. Set Objectives

Outline user stories and use cases to identify new data opportunities. At this point, we don’t yet need to know exactly what hypotheses or correlations exist. We don't even need to know what specific questions the data will need to answer. A list of “known unknowns” is good enough. The idea here is to help frame the kind of data we will need later on. We start by writing a few user stories of what we would like the data to do for us. Here are a few typical user stories for the health care data:

  • “As a data analyst, I want to know where and how many people have a specific disease and therapy regime so I can study the epidemiology of the disease.
  • “As a scientist, I want to know if the efficacy of a particular therapy is effected by a patient’s physical environment.”

Once we have obtained a list of user stories, we can then create data use cases. The difference between a user story and a use case is that a use case describes how the user may interact—perhaps a sequence of queries—with the data. This process is especially effective in discovering common data sources that may be used to address several use cases. When we have a list of data use cases, we move on to the next step.

2. Prioritize Use Cases

Perform an initial prioritization of data use cases, relying on stakeholders and key opinion leaders for segmentation direction. Two dimensions are typically used in this instance:

  • Urgency: Prioritize the use case list into 3 catagories: a) needed now/immediately, b) later/in a few months, and c) nice to have. These use cases depend on the organization’s goals and objectives. For example, if the manufacturer of a competitive drug has the ability to better predict efficacy based on specific environmental factors — and is rapidly gaining market share — then time is of the essence.
  • Value: Evaluate use cases for: a) high value/impact and b) low value/impact to the health ecosystem. Data value is also dependent on the organization’s goals and objectives.

Once data use cases are prioritized, we focus our attention on those that are urgently needed and of high value. We now proceed to data sourcing.

3. Source Data

With high-value use cases identified, we need to find data sources to address and support these use cases. The first step involves data inventory. Find out what data is currently available and/or can be easily obtained from other sources. This is where we can begin to utilize the proliferation of connected devices and data available via mobile platforms.

4. Connect the Dots

Understand how data sources can be linked or correlated, (e.g., by surroundings, time, weather, location, etc.) Beyond finding ways to index data, this process involves exploring new data sets and discovering common elements that link disparate data sources. This step is best accomplished in analyzing all available data sets. Missing data and information are often discovered in the process.

When we have reached this stage, we can prioritize available data sources to gain a clear idea of what data is easily accessible and manageable. We recommend focusing on key data sources that can be easily obtained and linked to other data sources.

The norm in healthcare is to gather patient data from traditional clinical sources such as heart rate, blood pressure, blood work-ups and electronic health records (EHRs). In the new digital landscape we can start to layer in non-traditional data, including valuable context that links to more traditional clinical data through unique identifiers.

5. Determine Data Architecture

Design an information architecture to support the high need/value use cases, but remember to leave opportunities for adding data connectors for future data sources, and even APIs (application programming interfaces) that can be used to extract data summaries for further analyses or reporting.

6. Data Modeling

Find data “glues,” indexing fields like IDs and timestamps that we can use to connect or link data from different sources or types. This process is more formal than the connecting-the-dots stage, and involves exploring data sources more in-depth. We will be setting up to build even more formal data structures in the next step.

7. Build Data Fusion Module and Data Integration

Build a data aggregation/fusion module to join and/or merge data that supports high-need, valuable and ‘easy’ use cases. This is where we write code that joins disparate data into usable datasets. We then store those data sets in data systems for querying.

8. Build an Analytics Engine

Feed integrated data into an analytics engine with an interface for querying and analyses. For the data scientists and analysts, this is the step where the rubber hits the road. At this point in the process, databases and stores should be queried and analyzed to address the urgent, high-value use cases identified at the beginning. This step also validates necessary investments in the data architecture and analytics engine, provided queries and analyses produce actionable results. If the results are meaningless and/or cannot be interpreted for policies or action, then more time and resources should be invested in re-designing data and systems architecture.

9. Present and Visualize

Develop the data visualization and presentation layer. Build a service layer where APIs can be written to extract data for summaries, display reports in tables or charts and on user interfaces that can be accessed by both analysts and stakeholders.

10. System Review

Determine if the system addresses the urgent, high-value data use cases. Find gaps in the data and information system. This information will then be used for the next iteration.

Learning from your initial approach, move into subsequent iterations:

11. Enhance Data

Having identified data gaps, find solutions and search for new data sources. Also, return to your list of use cases and tackle the harder-to-do but still urgent and high-value use cases.

12. Fortify Data Architecture

Redesign data models, if needed. This step will test whether data architecture and design is robust enough to accommodate additional and future data sources. Integrate additional data into the Fusion Module.

Having reviewed, redesigned and revised data and systems architecture, as well as the analytics engine and presentation layer, data is well on its way to answering many questions — and undoubtedly will lead to many more, given the nature of scientific inquiry. The process described above follows the same cyclical paradigm of scientific research. Because of the ubiquity of new technologies and the standard approach to analysis, barriers to entry for any business are quite low. However, the value in this model is not in capturing data, but in the detailed, rigorous and cyclical analysis of the data as just discussed.

Once you have the approach and tools in place, the next step is to develop a data strategy to ensure the actual collection of appropriate data. You can decide to collect data as part of a formal clinical trial, a Phase IV trial, or even as the by-product of another solution offering altogether. This step will help to refine the types of data you are collecting for analysis and how you can obtain that data in a way that supports the data sourcing noted above.

One of the most promising technologies to achieve an accurate health analysis is Artificial Intelligence (AI), in particular one specific method called: Reinforcement Learning (RL). RL has shown impressive results, e.g. it is the technology behind computers playing complex computer games, driving cars, optimising your Facebook feed, recommending your next Netflix series to watch, and it is now showing promise to enable personalised medicine and care.

Discover innovative healthcare solutions


Let our expertise complements yours

We believe that addressing customer challenges gives you opportunities to delight. Using our proprietary Friction Reports and  strong industry expertise, we dig deep into customer sentiment and create action plans that remove engagement roadblocks. The end result is seamless, relevant experiences that your customers will love.