CR 3.0 – A Manifesto for The Next Generation of Clinical Research Data Standards

During these last lingering days of summer before Labor Day, 12 months since retiring from CDISC and now 6 months with HL7, I’ve been contemplating a new vision for clinical research data.   My first exposure to this industry was during the teaming chaos of paper CRFs and double data entry – call that CR 1.0.  CR 2.0 involved EDC and the internet and the first generation of data standards, courtesy of CDISC.  In formulating my personal vision for CR 3.0, I thought of starting with a set of core principles of how things might be through the lens of my past and most recent experience – call it a manifesto for the next generation.  I’ve decided to try to describe my current thinking as a set of core principles:

  1. Whenever data can be captured directly at the source, it must be. EHR data is source data.  If the source is wrong, we need to fix the source, not just correct it downstream in a separate conflicting copy.  Traceability problems vanish when the data captured are the data reviewed.
  2. We must avoid data transformations wherever possible. Each transformation can introduce error and reduce data fidelity. Instead of twisting data to fit into different specific formats, we must learn to fit analytics directly to the data as captured in its native form – using standards that exist at the source.  This is how analytics are applied throughout the modern world of technology – why not in research too?  Why not research on internet time?
  3. Pharma should take advantage of the movement to structured data catalyzed by Meaningful Use and new value and outcomes-based reimbursement models in the USA. This means pharma adopting prevalent common healthcare standards like UCUM and LOINC used extensively in healthcare data records without requiring transformations to other coding systems used only in pharma research.  Pharma should also consider  including SNOMED codes applied at the point of capture in addition to MedDRA because they can provide additional contextual information that may be valuable to reviewers or researchers.
  4. The HL7® FHIR® standard, which is already being widely adopted throughout the world of healthcare, offers the best opportunity to date for research and other pharmaceutical processes to capitalize on the availability of rich EHR data – and can eliminate many of the inconsistencies and variations seen historically with secondary use of EHR data. FHIR can make it possible to reach inside of EHRs not just to capture data, but to monitor protocol progress, provide safety alerts, and allow much greater visibility into trial conduct and can lead to dramatic improvements in study efficiency and drug safety.  We need FHIR for better research.
  5. The current HL7 C-CDA standard provides a useful, persistent archival format for source data from EHRs, despite certain inconsistencies among different implementers. However, the next generation C-CDA on FHIR initiative should resolve many of these current limitations along a smooth migration path from the current C-CDA.
  6. While CDISC standards are currently the language for regulatory submission standards – and should continue to be so for many years to come given the lag time between study and submission – it’s critical for research to also begin adapting to new ways to power research, fueled by EHR data, based on HL7 FHIR. Now is the time to begin work on the standards for tomorrow.  But the CDISC SDTM should prioritize stability over constant change.
  7. With the widespread adoption of cloud technologies, and the ability of FHIR to access distributed data on demand wherever it resides, we are nearing the time when it will no longer be necessary to submit static copies of data from point to point. Instead, we should be planning to use FHIR to access and coalesce data in near-real time from the source, with full provenance and rich metadata, as a definitive single source of truth.  We must eliminate unnecessary redundancy, and use the full capabilities of modern technologies to move forward to the next generation of clinical research.

I recognize some of these may be too radical for some, and I’m sure there are many different ideas of what CR 3.0 may look like.  So I’m interested in starting a dialogue.  I’m also working on some sketches to help illustrate my manifesto which I may share eventually.  Looking forward to hearing what others may think until the next time I find a quiet summer afternoon to stare out my window at a luscious green garden and think other idle thoughts of where the future may take us.  Happy last days of summer!

FHIR® is the registered trademark of HL7 and is used with the permission of HL7.


5 thoughts on “CR 3.0 – A Manifesto for The Next Generation of Clinical Research Data Standards

  1. Dear Wayne,
    I am thinking along the same lines…
    In the “old” way of collecting data, mostly using CRFs, I have always been of the opinion that we should submit the collected data to the FDA as CDISC ODM (i.e. the source data) instead of transformed data. If the FDA insists, we could still annotate the collected data with information about where the data fits into another model (like SDTM). The FDA could then still do the transformation themselves if that makes it easier for them to review the data. Any transformation by the sponsor or a service provides IS an interpretation/categorization of the data, and thus subjective and hinders traceability.
    In the “new” era, the source will more and more be EHR, whether CDA or FHIR or OpenEHR. We can now already incorporate EHR data into ODM, and in the next generation of ODM, the difference will mostly vanish. However, due to the transformation requirement that we currently have, most of it disappears in the next step.
    Of course this requires the use of semantic standards as used in healtcare. CDISC still refuses to use LOINC and UCUM, invented their own “lists”, requiring another transformation step when starting from EHRs, and making reviewer’s life more difficult (yet-another-coding-system-to-learn).
    For submissions, we must start thinking beyond the concept of “files”. Is it really necessary that at the end of our study, we (must) send all (transformed) data as a single bunch of files? In the era of cloud and web services, is this still necessary? Can’t we think as a submission as an intelligently organized set of links to source data that resides on EHR or EDC systems?

    Liked by 1 person

  2. If we’re serious about reusing healthcare data, we should begin to adopt common health terminologies, starting with LOINC and UCUM as you say, Jozef. Time to begin convergence toward interoperability.


    1. This week is the CDISC US Interchange in Bethesda. I hope the new FDA commissioner Dr. Rober Califf, a medical doctor and professor in medicine himself, will say a very loud word about the use of LOINC and UCUM during his keynote “The Future of Medical Research and the Role of Standards: Forming Connections Towards Complementary Systems”. I see this as the only way to convince CDISC and the responsible teams that have blocked the use of these healthcare semantic standards sofar.


Leave a Reply to Wayne Kubick Cancel reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s