DFCI Cancer Curation Platform

 

Business Need

DFCI is unique among cancer institutes for its 50:50 split between clinical care and research. Much of the research relies on data collected through clinical care and stored in the EHR, but a crucial and large portion of this data is not analyzable.

Goal

Create a system that enables the curation of standardized, structured, and high-quality data from patient electronic health records (EHR). Provide a single institute-wide platform into which patient records can be curated. 

Timeline

3+ Years, 2017-2020

My Role(s)

Dana-Farber wants to create a platform to capture all cancer related data. I lead the design of this product. 

Over the years in addition to all UX/UI work, including user research, I’ve performed the following roles:

Product Owner: 

  • I helped initiate the agile development framework and performed the PO duties before we filled the role and in between transitions.

  • Participated in every stakeholder meeting throughout my time. 

Data: 

  • Translated data requirements into database design requirements

  • I was responsible for creating and maintaining the “Data Dictionary”, which contained the vocabulary of every data element captured in the platform. I learned python to help me in this capacity.

UI Developer: 

  • Created the Design System and implemented it using CSS, later refactored into SCSS.

 Data at Dana-Farber Cancer Institute

Structured Data

Structured data is analyzable, like billing diagnosis codes.

Structured, poor quality

Other data like medications or treatment plans suffer from missingness and other quality issues

Free-text Data*

Narrative data such as progress notes, pathology, and imaging reports are unstructured, messy, and not analyzable. Crucial outcomes data live here.

*Outcomes = How patients respond to treatment

Because outcomes are often trapped in not analyzable free-text data, a lot of manual effort is required to do research efficiently at scale despite the large amount of data Dana-Farber has.

Primary Users - Cancer Curators

CRCs who curate cancer data are most often young graduates who intend on pursuing a career in medicine or biology. 

All patient data at DFCI is stored in the EHR system EPIC. This is what the curators use to learn information about the patient. Everybody agrees that EPIC’s UI is very hard to navigate and confusing.

These CRCs will record data points that they found in the EHR into programs like Excel or Redcap. This current norm involves repeated, inconsistent, tedious curation and re-curation of cancer related data into siloed data repositories.

A streamlined interface for data curators

Curation platform streamlines source data from the EHR with inline curation directives and elevates traditional decentralized data entry with a user friendly UI.

A central place for recording high quality data

Curation Platform UI

A Curator will look at the free-text notes on the left and curate data about the patient on the right. 

Any existing structured data will be auto-populated into the form, and the user can “accept” the source data in each section. 

Each field comes with directives to help curators understand the data element.

A QA Person such as a PI or a team lead can answer questions or moderate the quality of the curation. 

Curation Platform

A curator will look at data from the electronic data warehouse, and curate (data abstraction) the key pieces of data, which gets saved into a central database. This data is available for research as well as clinicians.

Patient cancer journey

A patient’s cancer journey is usually messy and complicated.

After speaking with patients, Doctors dig through, read, and interpret a lot of not necessarily organized reports to figure out what happened.


 Doctors need to know: # of months did the patient “not get worse”:

This is know as “Real World Progression-Free Survival from the Date of Diagnosis”. I created infographics that show examples of visualizations that can be created using data curated in Curation Platform. I used python and plot.ly to create visualizations from a real dataset derived from the platform, and then used Sketch to style the visualizations.

Looking at this visualization, doctors can learn the story of a patient’s cancer diagnosis and treatment journey in minutes compared to the long time they would have spent gathering this information using the patient’s EHR.

Impact

Used for curating 6 cohorts of patients for AACR Project GENIE

  • American Association for Cancer Research

  • Genomics Evidence Neoplasia Information Exchange

  • In October 2019, a group of biopharma companies (BPC) pleaded $36 million in funding to GENIE to obtain curated clinical and genomic data from estimated 50k de-identified patients

  • Curation Platform is used in conjunction with RedCap to provide this curated data, eventual goal of replacing RedCap

  • This project began as a moonshot, with a scrappy team of a handful of engineers to a robust team of front and back-end developers as well as QA engineers.

More projects