Life after the v3.x SDTMIGs

Hard to believe that it’s been 11 years since the release of v3.1 of the SDTMIG.  Since then there have been 4 additional versioned releases, all based on the SDTM general class model, intended for representation as SAS v5 XPORT files.  SDTMIG still has plenty of life to it – in fact, one might argue that it’s just beginning to hit its stride now that use of CDISC standards will be mandatory in the US and Japan late in 2016.   But, let’s face it, as a standard that predates Facebook, YouTube, smartphones and reality TV, it’s also getting long in the tooth, and, indeed, may already be something of a legacy standard.

Perhaps the biggest limitation to the current SDTMIG is the restriction to use SAS v5 XPORT, a more than 30-year old format devised in the days of MS-DOS and floppy disks that is still the only data exchange format that the FDA and PMDA will currently accept for study data in submissions. While alternative formats have been proposed – the HL7 v3 Subject Data format in 2008, RDF in an FDA public meeting in 2012, the CDISC dataset-xml standard in 2013 – the FDA is still stuck on XPORT.  Recently they’ve asked the PhUSE CSS Community to help evaluate alternatives, which indicates that things haven’t progressed much closer to a decision yet.

The ripple effects of XPORT have severely limited the usefulness and acceptance of the SDTM beyond regulatory submissions – especially to those who haven’t grown up as SAS programmers working with domain and analysis datasets.  So any major new revision of the SDTMIG needs to start there, to split out all the XPORT-specific stuff.  This involves using longer field names, richer metadata, more advanced data types and eliminating field length restrictions.  That’s the easy part, but that’s not enough.  If we’re going to reconsider the SDTMIG, then we should use the opportunity to think broadly and address other needs as well.

We need a longer-term replacement, but we also need to keep the current trains running on time now.  Now that people are just getting used to the idea of a regulatory mandate to use SDTM and SEND, we certainly don’t want to change too much just yet.  We need to keep it stable enough so new adopters can get used to it – rapidly changing terminology gives them enough of a challenge to deal with without the pressure of adopting new IG versions.  I recently described one way to help minimize the number of necessary future versions of the existing XPORT-bound IG as a recipe.   We could do this now with the current version 3.2 and address many new needs.

On the other hand, we should be working on the next generation while we keep that venerable current one going.  In Chicago, the White Sox didn’t tear down the old Comiskey Park until the new U.S. Cellular field was finished — they built the new while using the old.  And they minimized making too many repairs to the old once they started working on the new.   So while we can assume we’ll need XPORT for some time even if a replacement exchange format is finally chosen, that shouldn’t stop us from rethinking the SDTMIG to better meet future needs now.  It’s time to think ahead.

What might a next generation SDTM look like?  A new SDTM for the future might have some of the following characteristics:

  1. As implied above, it should support standard content that’s independent of the exchange format. The standard should be easily representable in RDF, JSON (with HL7 FHIR resources and profiles), XML (and, yes, even XPORT for legacy purposes – at least for some years).
  2. A general class structure as used in the current model must remain as the heart of SDTM, though likely with some variations. We’ll want to retain the 3 general classes and most, but maybe not all variables (though such variables need precise definitions and more robust datatypes).  The core variables are essential, but perhaps some variables that are unique to a specific use case (such as those being introduced with new TAs or for SEND) can be packaged as supplements to augment the core under certain conditions.  What if there was a way to add new variables to general classes, timing and identifiers without necessarily creating a new IG version?  Rather than having to keep issuing new versions each time we want more variables, can’t a curated dictionary of non-standard variables – all defined with full metadata and applicable value sets – be used and managed separately in a manner similar to coding dictionaries?
  3. We may need some new general classes as well, such as the long-recognized need for a general class to represent activities such as procedures.
  4. We should reassess, with the benefit of hindsight, what data really belongs in which class. For example, perhaps substance use data (smoking, recreational drugs, alcohol) might be better represented as findings along with other lifestyle characteristics, which would better align with how such data is represented in healthcare systems.  Disposition data might fit better as an activity rather than event.
  5. Thorough definitions for each variable (a task already in progress), and variable names that are more intelligible – without being limited to 8 characters with a domain prefix – are mandatory.
  6. We should remove redundant information that can easily be looked up (as Jozef Aerts has long proposed). Lookups can be made via define-xml codelists or web services.
  7. Other non-backwards compatible corrections to known issues, deep in the weeds should also be addressed – such as distinguishing timings associated with specimen collection from point in time result findings – and resolving that strange confusion between collection data and start date in the Findings class.
  8. Perhaps a reconsideration and simplification of the key structure is in order, replacing the Sequence variable with a unique observation identifier/Uniform Resource Identifier (URI) that can be referenced for linked data purposes and make it easier to represent more complex associations and relationships (including the ability to be extended dimensionally with meta observations such as attributions and interpretations). This would be part of a richer metadata structure that should also support the representation of concepts.
  9. A more advanced extension mechanism that replaces the cumbersome supplemental qualifier approach is critical (such as the one already proposed by SDS) so users can easily incorporate those special use case variables mentioned in item 2 above.
  10. And we need the ability to align better with other healthcare-related information, to make it possible to use clinical study data with other real world data sources, and the courage to modify the SDTM to facilitate such alignment where appropriate.

Now, some might argue that this is still limiting ourselves to 2-dimensional representations here – which is indeed a valid criticism.  But maybe the longer term solution involves more than one representation of the data.  Perhaps we have a broad patient file with both structured and unstructured source information as a sort of case history, and representations/views in tabular structures that are derived from it – an old idea which might be getting closer to prime time.  Thinking beyond the table/dataset way of thinking should certainly be part of the exercise.

I know many are already impatient for change (at least as far as XPORT is concerned), and others feel we should just throw it all away and adopt more radical solutions.   But my personal feeling is that we need to keep what we have, which has already taken us much farther than we could have imagined 15 years ago, and build from that.  The approach echoes that of a 2009  New Yorker article by the great Atul Gawande about the upcoming healthcare reform, where he advocated building up from our history of employer-provided insurance rather than jumping to something radically different, like single-payer.  “Each country has built on its own history, however imperfect, unusual, and untidy… we have to start with what we have.”

So whatever we do, we should start with SDTM as governing model that really drives implementation, with more extensive metadata, clear definitions, complex datatypes, and a simpler extension mechanism.  An improved SDTM can drive implementation and result in a more streamlined implementation guide, that also shows how to apply research/biomedical concepts, controlled terminologies and computer-executable rules (e.g. for verifying conformance, derivations, relationships, etc.) and where to find use cases and examples. Such use cases and examples (as for Therapeutic Areas) could be maintained separately in a knowledge repository, and the SHARE metadata repository would provide all the pieces and help put them together.  We start with the SDTM and metadata and build out from there.  But we need to build in a way to converge with the opportunities provided by what’s going on in the world of healthcare, technology and science.  Like the Eastbound and Westbound project teams of the transcontinental railroad 150 years ago, we should endeavor to meet in the middle.

A Modest Proposal Part II – What Meds Are You Taking?

In part I of this blog I proposed an activity to define a single content standard for representing general patient history – something that would make it easy both for patients visiting different healthcare providers but also for researchers to better manage and utilize the information value of medical history when recruiting or enrolling clinical trial subjects. But this is only one of the ways our healthcare system annoys patients. Another is the single-most asked question when visiting any provider each and every time — “What meds are you taking?

Now, I’m disputing the need to ask. We know that many serious drug-related adverse events may stem from drug interactions, and that many patient conditions may just as well be caused as cured by certain drugs. The problem is that we seem to handle this data so poorly, and instead of following the mantra of “Collect once, use many times” providers and researchers just keep collecting over and over again – and sloppily at that.

In my case, I always keep a simple document of all the meds I’m on plus any nutritional supplements I may take with dosages and frequencies included. I print this out and bring a long a USB drive with the same material. Even so, I’m almost always asked to write it down on one of those paper forms – just like that annoying medical history form. Then someone types it into a system. Just like a game of telephone, the data may get slightly skewed with each new transcription – which is yet another reason why the question must be asked each time. At one provider (when I needed an ECG) I had to repeat it 3 times to a nurse, doctor and an intern – despite handing in the printout and copying it to their form.

When I type “medication tracker” in the Apple App Store, I get 148 hits of apps that can do this for me, and I know of websites that offer to do the same. What I can’t seem to find is a simple standard way of recording my medication use data once so that all of my providers (or any research investigators) could access a single trusted source of information for Patient Wayne. My ideal meds app would allow me to pick each med I’m taking from a list of available medications that are coded according to NCD, RxNorm and WHODRUG codes, perhaps be able to import a record of a new prescription from my docs automatically, and allow me to edit my actual usage data before my next doctor visit if I changed my habits. I’d still expect the doc to ask to confirm, but only after looking at my current official list.

There are multiple ways to make such accessible – perhaps I might have a Personal Health Record (such as the ones that Microsoft and Google were trying to power through Healthvault and Google Health). Or maybe I can just use the portal of my primary care provider and trust them to make it accessible to all the other specialists I must occasionally visit (of course, this presumes that interoperability between providers actually is functional, which hasn’t been true so far in my case to date). Or maybe it’s just an XML file on my phone that I can send on demand.

Wouldn’t this be better than having me scribble this on a form over and over, or a physician having to guess if I happened to be pulled unconscious into an ER one of these days, or with data managers having to deal with sloppy and unreliable data on a ConMeds CRF? Having ready access to reliable medication use data for patients could be a tremendous boost to pharmacovigilance and epidemiological research as well.

Yes, Interoperability should be great once it finally arrives. In the meantime, can’t we just find a simpler, standardized way for my doctors to know my meds?