DFCI Cancer Curation Platform
Business Need
DFCI is unique among cancer institutes for its 50:50 split between clinical care and research. Much of the research relies on data collected through clinical care and stored in the EHR, but a crucial and large portion of this data is not analyzable.
Goal
Create a system that enables the curation of standardized, structured, and high-quality data from patient electronic health records (EHR). Provide a single institute-wide platform into which patient records can be curated.
Timeline
3+ Years, 2017-2020
My Role(s)
Dana-Farber wants to create a platform to capture all cancer related data. I lead the design of this product.
Over the years in addition to all UX/UI work, including user research, I’ve performed the following roles:
Product Owner:
I helped initiate the agile development framework and performed the PO duties before we filled the role and in between transitions.
Participated in every stakeholder meeting throughout my time.
Data:
Translated data requirements into database design requirements
I was responsible for creating and maintaining the “Data Dictionary”, which contained the vocabulary of every data element captured in the platform. I learned python to help me in this capacity.
UI Developer:
Created the Design System and implemented it using CSS, later refactored into SCSS.
Data at Dana-Farber Cancer Institute
Structured Data
Structured data is analyzable, like billing diagnosis codes.
Structured, poor quality
Other data like medications or treatment plans suffer from missingness and other quality issues
Free-text Data*
Narrative data such as progress notes, pathology, and imaging reports are unstructured, messy, and not analyzable. Crucial outcomes data live here.
*Outcomes = How patients respond to treatment
Because outcomes are often trapped in not analyzable free-text data, a lot of manual effort is required to do research efficiently at scale despite the large amount of data Dana-Farber has.
Primary Users - Cancer Curators
CRCs who curate cancer data are most often young graduates who intend on pursuing a career in medicine or biology.
All patient data at DFCI is stored in the EHR system EPIC. This is what the curators use to learn information about the patient. Everybody agrees that EPIC’s UI is very hard to navigate and confusing.
These CRCs will record data points that they found in the EHR into programs like Excel or Redcap. This current norm involves repeated, inconsistent, tedious curation and re-curation of cancer related data into siloed data repositories.
A streamlined interface for data curators
Curation Platform UI
A Curator will look at the free-text notes on the left and curate data about the patient on the right.
Any existing structured data will be auto-populated into the form, and the user can “accept” the source data in each section.
Each field comes with directives to help curators understand the data element.
A QA Person such as a PI or a team lead can answer questions or moderate the quality of the curation.
Curation Platform
Doctors need to know: # of months did the patient “not get worse”:
This is know as “Real World Progression-Free Survival from the Date of Diagnosis”. I created infographics that show examples of visualizations that can be created using data curated in Curation Platform. I used python and plot.ly to create visualizations from a real dataset derived from the platform, and then used Sketch to style the visualizations.
Looking at this visualization, doctors can learn the story of a patient’s cancer diagnosis and treatment journey in minutes compared to the long time they would have spent gathering this information using the patient’s EHR.
Impact
Used for curating 6 cohorts of patients for AACR Project GENIE
American Association for Cancer Research
Genomics Evidence Neoplasia Information Exchange
In October 2019, a group of biopharma companies (BPC) pleaded $36 million in funding to GENIE to obtain curated clinical and genomic data from estimated 50k de-identified patients
Curation Platform is used in conjunction with RedCap to provide this curated data, eventual goal of replacing RedCap
This project began as a moonshot, with a scrappy team of a handful of engineers to a robust team of front and back-end developers as well as QA engineers.
More projects
-
BioSample Data Explorer
-
Broad's COVID-19 Sequencing Dashboard
-
DFCI Patient Statusboard