EHR eSource: Sword of Change?

Note:  Some of this material will be published in Applied Clinical Trials, June 2016.

We, in the biopharmaceutical clinical research world, are creatures of habit, doing our jobs in a consistent, repeatable process, usually driven by SOPs and systems.  Change comes slowly; old habits die hard.

When EDC systems came of age around the turn of the century, health records were an inaccessible mess – and healthcare clinical data was hardly of sufficient quality to support what we felt was our gold standard of randomized clinical trials.  So we continued to treat clinical research data as an entirely separate process, with data entered into CRFs manually.  Electronic health data was scarce and dirty, didn’t line up with research databases, used different terminologies, and simply was too difficult to reuse.

But, now that the era of digital health is upon us in the USA and other countries, it’s time for a fresh look at how we collect research data.  What if we could make a great leap forward by completely changing our approach through EHR eSource?  Would we dare to try?

“Source” is the initial recording of data for a clinical study.  When the original recording is on digital media rather than paper, it’s “eSource”.  Clinical trials have used eSource for years in ECG readings, lab results and other measurements.  The FDA eSource Guidance describes different ways to transmit eSource data (from direct capture, devices, transcription, EHRs or PRO instruments) to an eCRF (or EDC) system, and several approaches have been proposed for trying to feed EHR data into our existing EDC-based processes.  Last year the FDA even asked for demonstration projects to explore such approaches.

But a more recent draft Guidance on Use of EHR Data in Clinical Investigations offers another take entirely with explicit goals to “facilitate the use of EHR data in clinical investigations” and to “promote the interoperability of EHRs” with clinical research systems.  The guidance recognizes that the ONC Health IT Certification Program can indicate the readiness of EHRs to support research.  And ONC’s Advancing Care Information initiative (the successor to Meaningful Use) relies heavily (among other things) on leveraging APIs to make health data more timely and accessible to patients and caregivers.

This is where HL7’s FHIR® platform standard comes in:

  • FHIR’s Data Access Framework will provide a universal API to EHR systems that can be used to populate much of a casebook in a clinical database.
  • The Smart on FHIR specification demonstrates how patients can grant researchers access to their data through electronic informed consent, as well as input outcome data through smartphones and browsers – data that can be directed to an EHR or a trusted third-party cloud-based research repository simply by selecting the appropriate target FHIR server.
  • Since EHR data is eSource, FHIR can also provide authorized access to remote study monitors.
  • And since FHIR can update as well as read data, it can also support the processing of data clarification transactions, thus making it possible to synchronize EHR records with clinical databases, improving transparency and traceability for both monitors and regulatory inspectors.
  • What’s more, FHIR makes it possible for regulatory reviewers to delve into the full EHR database to explore, for example, serious adverse events, in more depth than was ever possible before.

But is it really necessary to have an eCRF system in the middle at all?  In other words, can we use EHR data directly to feed our analysis so that all health data could potentially be reused as research data?  What could be more transparent, traceable, and efficient than going from source to analysis with as few steps and transformations as necessary?  But that’s a provocative question for another day.

 

Advertisements

Thoughts on Interoperability

Within the data standards community – and especially among CDISC and HL7, the term “interoperability” is commonly espoused as a vision, mission and goal.  For CDISC, the term refers to the ability of clinical studies to reuse data that originate as eSource from electronic health record (EHR) systems by pre-populating study CRFs through its Healthcare Link initiative, most typically for sponsored clinical trials.  But the problem goes much deeper than that.

In the world of healthcare informatics, interoperability is a broader term describing the goal of rapid and easy exchange of meaningful, usable information between participants or systems for any relevant health-related purpose.  For example, when I visit a new doctor, interoperability should make it possible for both of us to easily get all my medical information from my prior doctors.  Instead, I have to complete a new medical history form and list of current medications for each physician or specialist — since there is no comprehensive repository of all my medically-information anywhere except maybe in my head.  For a more specific example, as part of a recent medical procedure, I had to visit 5 providers:  Primary internist, surgeon, lab, ECG, MRI.  Sure, I could get a copy of my record for each visit after the fact by logging into 5 separate portals under separate IDs and passwords, but I couldn’t get one to pass anything on to the next.  In fact, in order to get my MRI from the imagist to the surgeon, I had to pick up and deliver the DVD myself.  No, that’s not interoperability.

Of course interoperability is difficult, but not impossible – as many have stated, technology is not the problem.  Sure we have to ensure security, privacy, and proper authorization – but those challenges can be met.  As a matter of fact, I just did so myself for a previously unknown distant relative who shares with me a common great-great-great grandmother from the old country via 23andMe.

For all of us who will know patients or be patients at one time or another, the biggest challenges to interoperability are often between healthcare and research organizations and even within them.  There are plenty of excuses for not sharing data (as confessed in a rather controversial recent editorial of the New England Journal of Medicine).   But whatever motives may drive or block the sharing of data for individual cases, the representation of data in a syntax and vocabulary that at least has the potential to be consistently expressed and understood by separate authorized parties under the proper circumstances should be a common goal.  Indeed it should be possible for patients – and their authorized caregivers – to get the medical information they need when they need it to deliver the best possible care.

So, while data standards development may be a particularly arcane and even tedious task to the vast majority, the need to make it possible for a physician, who may be treating a vacationing patient with a sudden illness, to be able to quickly retrieve his medical records from other providers is something we should all support – it could be, truly, a matter of life or death.

And interoperability is essential to achieve some of the most ambitious and far-reaching of health-related future visions, like the Learning Health System (LHS) and the Precision Medicine Initiative (PMI).  While the link from an LHS to clinical research has always seemed tenuous because of the long lag time between processing of pre-marketing clinical studies and delivery of approved new therapies to patient care, the PMI should clearly build on a baseline of knowledge acquired through research.  As I’ve espoused in public CDISC presentations many times over the years, a most critical future objective of research is to learn how to tap into the fundamental data flows of digital healthcare as much as possible, rather than trying to operate in a separate, often redundant parallel world.

We’re at a point in time when we have the technology and awareness to finally make real progress toward interoperability.  What we need is the courage and will to put the final pieces together and really make it happen.  A goal as important as this, which can conceivably affect everyone in the world toward the betterment of health care, is not to be taken lightly.  For my part, I’ve decided to go all in.

The Lumping and Splitting Acid Test

Call me “Lumpy.”  And I’m a lumpaholic.  I acknowledge this predilection, because I simply feel lumping makes things easier, and I’m generally more comfortable with big pictures than detailed nuances.

But I’m not a lumping zealot.  I have many close friends are splitters, and I can respect their point of view some of the time.   It’s not unlike the dynamic between friends who are either Cubs or Sox fans in Chicago, for instance – we can agree to disagree, even when they’re wrong.

The term “lumping and splitting” apparently was coined by Charles Darwin with respect to how to classify species, and is explained nicely here:

  • (For lumpers) “two things are in the same category unless there is some convincing reason to divide them.”
  • (For splitters) “Two things are in different categories unless there is some convincing reason to unite them”.

This clearly defines the opening position of each camp, and relies on the existence of a compelling and logical reason to go the other way, which sounds reasonable enough.

Now a partisan divide of lumpers and splitters also exists in the CDISC SDS/SDTM microcosm, where the term is applied to the decision on whether to create new domains.   Lumpers believe there should be a limited number of domains, to make it easier for sponsors to know where to put data.  Splitters want to create many more domains with finer distinctions between them.  The CDISC SEND team follows a very fine-grained splitting approach.  But that does not necessarily mean human trials should follow the same pattern, since a lumping approach has also been followed with questionnaires and lab results data.  In the latter cases, the SDTMIG describes a way to split these often massive lumped domains into separate files to conform to FDA file limits, but they’re still considered the same domain.

This difference of opinion has recently been illuminated as the team has wrestled with how to represent morphology and physiology data, a topic that’s critical to CFAST Therapeutic Area standards.  Some years ago, the SDS team made a decision to separate physiology data by body system, and reserved some 15 separate physiological domain codes even though no specific use cases had been defined for all of those yet, while lumping together morphological observations in a single domain.  This proved problematic, because it wasn’t always clear which type of test belonged where.  So the team decided (through an internal poll) to merge Morphological observations into the various physiology domains.  An alternative lumping proposal to combine all physiology data in a single data, and use a separate variable for Body system (as was already the case in Events and the PE domain) was proposed but did not gain sufficient traction.

Splitting may be fine as long as there’s that convincing reason and it’s clear where to split – like the “Tear Here” perforation on a package.  We can easily see that AEs differ from Vital Signs (though the latter may indicate a potential AE).  And I’m not suggesting that we lump together domains previously released (well not all, anyway). But what do you do with a complex disease such as cancer or diabetes, or a traumatic injury that affects multiple body systems?  what happens with, say, observations related to a bullet wound that penetrates skin, muscle and  organs when you have to split these over multiple domains?

In such cases, wouldn’t a physician – or FDA medical reviewer – want to see all of the observations relevant to the patient’s disease state together rather than jumping around from file to file and trying to link them up?   And how are patient profile visualizations affected when multiple, possibly related morphological and physiological observations are split across many separate files, since patient profiles tend to look in a finite number of domains for core data (typically DM, EX, LB, VS, AE and CM) – adding in dozens of others is likely to be challenging.  And maybe it would reduce the constant stress of having to accommodate more and more new domains with each new version if SDS didn’t feel a need to keep adding more and more domains each time.

This brings me back to a “first principle” – making it easy on the reviewer. If you know exactly what you’re looking for, and you’re confident everyone else knows to put the same type of data in the same place, then maybe it’s easy to point to a separate fine-grained domain.  But what if you’re looking to see possible associations and relationships that may not be explicit (which FDA reviewers have lamented previously).  I think for most of us who are familiar with spreadsheets and other query and graphics tools, it’s far easier to have all the data you want in one place and rely on filter and search tools within a structured dataset rather than search/query across a directory full for files.  And that’s one reason why FDA has been working so long on moving from file based data review to their Janus Clinical Data Repository (CDR) – so reviewers can find the data they want through a single interface.  In my experience, CDRs (including Janus) do not usually split most findings type data by domain.

Lumping solutions should be more consistent between sponsors and studies, since there’s a simpler decision on where to put data.  Just like when shopping at Costco, it’s easier to make a decision when there are fewer choices involved. And just as it’s quicker and easier to pick out some bathroom reading from a single shelf rather than have to search through an entire library, lumping should make it quicker and easier to access the data you want.

So, what exactly is the acid test?   Whichever approach is chosen for SDTMIG in the future (and it should be up to the community to weigh in when the next comment period begins), one overriding principle must be in place:  a new domain split should never be created unless there’s a clear rationale for splitting (and in the case of Physiology, I don’t really see that myself), and it’s absolutely, unambiguously clear when the new domain is to be used for what kind of data.  If there’s any ambiguity about whether a certain type of data could go here or there, then we should opt for a lumping solution instead, and allow users to rely on query and filter display tools to pick out what they want.  Meanwhile, we can rely on our statistical programmers put data logically together in analysis files just as we always have.

So, lumpers of the world, unite!  There are simply too many other problems to tackle other than the “What domain do I use this time?” game.  Like defining rich metadata models for sets of variables with valid terminology (i.e., concepts), or defining a next generation SDTM that isn’t based on SAS XPT.

In the meantime, another scoop of mashed potatoes for me, please – lumps and all.

R.I.P. Time for Supplemental Qualifiers

Warning:  this one’s primarily for SDTM geeks.

Back when the SDTM and SDTMIG v3.1 were being created circa 2003, there was never a delusion that the SDS team had thought of everything.  The SDTMIG domains were created by taking the least common denominator among CRFs from several major pharma companies.  It was always understood that we could only standardize on a core set of variables – that individual studies would almost always find cases when they’d need to add additional variables for some specific purpose.

The chosen solution for handling these extra variables was (shudder) supplemental qualifiers (SQs).  The original use case for SQs was to provide a way to represent flag variables like “clinically significant” that might be attributed to different people – an independent reviewer or DSMB, for instance.  But this was expanded to handle other variables that didn’t fit within any variable defined in the general classes.  A name/value pair structure with a different row for each variable and its value was adopted – quite flexible, but not very user friendly.  This was not viewed as a problem by all — there was a perception (held by one our FDA observers among others) that by making it difficult to represent SQs, sponsors would be disinclined to use them, and thus the standard would be leaner and more consistent and not get cluttered with other messy data.

But that assumption was wrong.  It turned out that many standard domains often need additional variables to fully describe a record – often essential information related to a specific therapeutic area.  So the SQ files kept getting bigger and bigger.

And SQs were clunky in many ways.  It was necessary to use value-level metadata to describe the variable in define.xml files.  And some tools or users had difficulty merging them back into parent domains.  And because they were so unwieldy, voluminous and hard to read, some reviewers simply gave up looking at them at all, resulting in risks that critical information might be missed during a review.

So some SDS team members wisely proposed an alternative proposal to place these SQs (which they renamed “Non-Standard Variables” or “NSV’s”) in the parent domain.  Instead of physically separating these out into another file structured differently, the proposal appended these to the end of the dataset record and relied on Define metadata to tag these as non-standard.  The metadata tag would make it straightforward to strip these out into a separate SuppQual structure if that was still needed for some reason (such as conforming to a database load program expecting such a file) but the dataset would already include these variables where they belong so they’d be less likely to be missed.

But this reasonable proposal wasn’t viewed as a panacea by everyone.  FDA was still concerned that this would encourage sponsors to add more and more unnecessary variables – which might just be noise to a reviewer.  And they worried about increasing the file size beyond their acceptable limits.  (But at least they didn’t disagree that these were a whole lot more trouble in their present form than they anticipated).

Meanwhile, other members of the SDS team objected to the proposal as an unnecessary change – since most companies had already invested in ways to create these and didn’t want to have to change again (even if the datasets would be more useful and their processes simpler if they did).  This, of course, is the notoriously stubborn  “sunk cost” fallacy.

But let’s pause now for a moment.  We know that the current SuppQual method is a clunky solution, which was already revised once (in SDTMIG v3.1.1, when a single file proved unmanageable and too big to submit), and that we still hear it can cause review problems for many and is seen as an unnecessary extra non-value added step by many more.  But we don’t want to offer a simpler and more efficient solution instead because we’ve already invested in the clunky solution?  Hello?

So, here’s another suggestion.  Let’s create a separate file with the exact same structure as the parent domain – namely, use the SDTM unique keys (STUDYID-DOMAIN-USUBJID-XXSEQ) and add in all the NSVs as additional columns.  Such a structure would allow full metadata representation in Define-XML – just like the other variables — and is a much simpler merge (and, for sponsors, also a simple split to take them out).  To allow applications to realize that this is a different merge from the normal SUPP—format, perhaps a new file name prefix can be used, such as SUPW (for “wide” or some other name, whatever).

Under such a scenario, FDA should be happy that file sizes are smaller (and smaller than the current tall and skinny SuppXX files, since they tend to expand to reserve as much space for all variables as required by the biggest), and the variables can be easily viewed in the dataset whether they’re merged or not – making it possible to only merge in the ones of interest if the possibility of noise is still a concern.

Not quite as elegant as a single file solution, but certainly seems to me better than the status quo.  And for those SDTM old-timers who still want to do it the old way, well, they can probably adapt the code they’ve already written to strip out the NSVs when they create SDTM (and put them back for their statistical programmers and internal reviewers) already and keep wasting time doing it the old way as well if that’s what really makes them happy.

Seriously, can’t we bury these SUPPxx files once and for all and try to agree to make SDTM datasets just a little more useful?  What’s the controversy with that?

SDTM as a Cookbook

The original CDISC Study Data Tabulation Model Implementation Guide (SDTMIG) was built around a core set of domains describing data commonly used in most clinical trials — much of which were necessary to understand basic product safety.  Over time, the SDTMIG has been expanded with new versions that incorporated more and more domain models.  SDTMIG v3.2 has nearly 50 domains, with several more to come with the next version.

This domain-based approach, while very familiar to those involved in study data management and analysis, has been a mixed blessing in many ways:

  • FDA acceptance testing of new SDTMIG versions has been taking a year or more post publication, and sometimes this belated support for new versions comes with exceptions — a serious disincentive to early adoption by sponsors.
  • While continually adding new domain models may be helpful to some, others may find these threatening – due partly to the excessive effort of change management in a regulated systems environment in general and especially if they’ve already developed alternative approaches to modeling similar data in custom domains which might require last minute database changes to support new SDTMIG versions for ongoing drug development programs.
  • The time it takes the SDS team to release a new SDTMIG version is at least two years or more, and its already massive but steadily increasing size and complexity makes updating more difficult and adoption and understanding by end users burdensome. And still publicly available validation tools have not been able to keep up.
  • Meanwhile, the long timeline has made it necessary for the timeboxed CFAST Therapeutic Area (TA) User Guides to issue Draft or Provisional domain models to capture TA-specific requirements, which FDA seems reluctant to accept, again discouraging early adoption and injecting more fear, uncertainty and doubt since such domains may still undergo change before being added to a new SDTMIG version.

So this spiral of steadying increasing complexity and version management seems to be a cause of widespread consternation.

Cooking Up an Alternative Solution

What if we considered the SDTMIG as something more like a cookbook of recipes rather than a blueprint?  A recipe describes the general ingredients, but the cook has discretion to substitute or add some additional ingredients and season to taste.   What if the SDS and CFAST teams — rather than continuously creating more and more new domain models (sometimes with seemingly overlapping content), and waiting to embed these within a new version of the normative SDTMIG every few years to address any changes or additions —  took a more flexible, forgiving approach.  So a new Therapeutic Area User Guide would continue to describe what existing final SDTMIG domains may be most appropriate for representing TA data such as relevant disease history, baseline conditions, and, when necessary, include directions for cooking up a new custom domain for, say, key efficacy data.  The recipe would state which SDTM general class to use, which variables from that class apply, what current or new controlled terminologies should be used, and other implementation details, such as any incremental business rules that might be used for validation checks.  in lieu of a traditional domain model, the recipe could include an example of the Define-xml metadata for the domain.  But it’s up to the cook to adjust for the number of guests at the table, prepare, add garnishes and serve.  Such a recipe could be referenced (just like an SDTMIG version and a domain version) in a define-xml file (a proposed new v2.1 of Define-XML has features to support this) as if it was a standard, but, in reality, it would be a best practice guideline.

Representing domain models as recipes (or guidelines if you prefer) would have the advantage of producing domains that would already be acceptable for FDA submission (since custom domains are being accepted by FDA), and yet still being checked for consistency with controlled terminology and validation rules.  And adopters of CFAST TA standards might start using these more quickly, allowing these to be field tested so they can evolve more rapidly without waiting years for a new SDTMIG version to bless them.  As people learn more and gain more confidence, they can post additional examples and comments to a Wiki that can inform future user so everyone builds on everyone else’s experience over time.

Under this approach, the SDTM becomes the real model/basis for the standard, and the domain models would be treated as representative use cases.  The recipe can also include incremental validation rules, but must remain faithful to any SDTM model-level rules (which would have to be promoted upward from the IG presumably). New models may need new variables in many cases, and these variables should be added to newer versions of the SDTM.  But, assuming the SDS team finally adopts a more efficient manner of representing Non-Standard Variables (as they have already proposed in two drafts for comment) it would be easy enough for custom recipe-driven domains conforming to a current version of the SDTM to add each necessary new variable as an NSV at first.  These NSVs could then to uplifted to become standard variables later in a new SDTM version, which would only require a simple change to an attribute in the define.xml once that new version becomes available. Either way, the new domain model can be used immediately by applying current available structures in the SDTM with some incremental new terminology (such as a new domain code describing the contents and a new controlled term for a variable name.)

This does not mean the SDTMIG goes away – it’s still needed to describe the context, general assumptions and rules, and show how to implement key SDTM concepts like the 3 general classes, trial design model, special purpose and relationship classes.  But the IG wouldn’t need to cover each new variable and domain anymore and could focus on explaining core SDTM concepts with assumptions and rules instead.  It could make for a leaner SDTMIG, which doesn’t have to change so often, and impart more flexibility and agility in the use of domains.

Such an approach could also be a good first step toward decoupling the SDTMIG from the highly cumbersome, restrictive use of SAS v5 Transport files, and make the model more adaptable for implementation in new technological frameworks, such as HL7 FHIR and the semantic web.

Of course this still leaves a number of challenges in terms of terminology management, but that’s a topic for another day.

In the meantime, what do you think about a simpler SDTMIG, that emphasized applying the general model with terminology to meet the needs for new CFAST domains?

A Modest Proposal Part II – What Meds Are You Taking?

In part I of this blog I proposed an activity to define a single content standard for representing general patient history – something that would make it easy both for patients visiting different healthcare providers but also for researchers to better manage and utilize the information value of medical history when recruiting or enrolling clinical trial subjects. But this is only one of the ways our healthcare system annoys patients. Another is the single-most asked question when visiting any provider each and every time — “What meds are you taking?

Now, I’m disputing the need to ask. We know that many serious drug-related adverse events may stem from drug interactions, and that many patient conditions may just as well be caused as cured by certain drugs. The problem is that we seem to handle this data so poorly, and instead of following the mantra of “Collect once, use many times” providers and researchers just keep collecting over and over again – and sloppily at that.

In my case, I always keep a simple document of all the meds I’m on plus any nutritional supplements I may take with dosages and frequencies included. I print this out and bring a long a USB drive with the same material. Even so, I’m almost always asked to write it down on one of those paper forms – just like that annoying medical history form. Then someone types it into a system. Just like a game of telephone, the data may get slightly skewed with each new transcription – which is yet another reason why the question must be asked each time. At one provider (when I needed an ECG) I had to repeat it 3 times to a nurse, doctor and an intern – despite handing in the printout and copying it to their form.

When I type “medication tracker” in the Apple App Store, I get 148 hits of apps that can do this for me, and I know of websites that offer to do the same. What I can’t seem to find is a simple standard way of recording my medication use data once so that all of my providers (or any research investigators) could access a single trusted source of information for Patient Wayne. My ideal meds app would allow me to pick each med I’m taking from a list of available medications that are coded according to NCD, RxNorm and WHODRUG codes, perhaps be able to import a record of a new prescription from my docs automatically, and allow me to edit my actual usage data before my next doctor visit if I changed my habits. I’d still expect the doc to ask to confirm, but only after looking at my current official list.

There are multiple ways to make such accessible – perhaps I might have a Personal Health Record (such as the ones that Microsoft and Google were trying to power through Healthvault and Google Health). Or maybe I can just use the portal of my primary care provider and trust them to make it accessible to all the other specialists I must occasionally visit (of course, this presumes that interoperability between providers actually is functional, which hasn’t been true so far in my case to date). Or maybe it’s just an XML file on my phone that I can send on demand.

Wouldn’t this be better than having me scribble this on a form over and over, or a physician having to guess if I happened to be pulled unconscious into an ER one of these days, or with data managers having to deal with sloppy and unreliable data on a ConMeds CRF? Having ready access to reliable medication use data for patients could be a tremendous boost to pharmacovigilance and epidemiological research as well.

Yes, Interoperability should be great once it finally arrives. In the meantime, can’t we just find a simpler, standardized way for my doctors to know my meds?