The original CDISC Study Data Tabulation Model Implementation Guide (SDTMIG) was built around a core set of domains describing data commonly used in most clinical trials — much of which were necessary to understand basic product safety. Over time, the SDTMIG has been expanded with new versions that incorporated more and more domain models. SDTMIG v3.2 has nearly 50 domains, with several more to come with the next version.
This domain-based approach, while very familiar to those involved in study data management and analysis, has been a mixed blessing in many ways:
- FDA acceptance testing of new SDTMIG versions has been taking a year or more post publication, and sometimes this belated support for new versions comes with exceptions — a serious disincentive to early adoption by sponsors.
- While continually adding new domain models may be helpful to some, others may find these threatening – due partly to the excessive effort of change management in a regulated systems environment in general and especially if they’ve already developed alternative approaches to modeling similar data in custom domains which might require last minute database changes to support new SDTMIG versions for ongoing drug development programs.
- The time it takes the SDS team to release a new SDTMIG version is at least two years or more, and its already massive but steadily increasing size and complexity makes updating more difficult and adoption and understanding by end users burdensome. And still publicly available validation tools have not been able to keep up.
- Meanwhile, the long timeline has made it necessary for the timeboxed CFAST Therapeutic Area (TA) User Guides to issue Draft or Provisional domain models to capture TA-specific requirements, which FDA seems reluctant to accept, again discouraging early adoption and injecting more fear, uncertainty and doubt since such domains may still undergo change before being added to a new SDTMIG version.
So this spiral of steadying increasing complexity and version management seems to be a cause of widespread consternation.
Cooking Up an Alternative Solution
What if we considered the SDTMIG as something more like a cookbook of recipes rather than a blueprint? A recipe describes the general ingredients, but the cook has discretion to substitute or add some additional ingredients and season to taste. What if the SDS and CFAST teams — rather than continuously creating more and more new domain models (sometimes with seemingly overlapping content), and waiting to embed these within a new version of the normative SDTMIG every few years to address any changes or additions — took a more flexible, forgiving approach. So a new Therapeutic Area User Guide would continue to describe what existing final SDTMIG domains may be most appropriate for representing TA data such as relevant disease history, baseline conditions, and, when necessary, include directions for cooking up a new custom domain for, say, key efficacy data. The recipe would state which SDTM general class to use, which variables from that class apply, what current or new controlled terminologies should be used, and other implementation details, such as any incremental business rules that might be used for validation checks. in lieu of a traditional domain model, the recipe could include an example of the Define-xml metadata for the domain. But it’s up to the cook to adjust for the number of guests at the table, prepare, add garnishes and serve. Such a recipe could be referenced (just like an SDTMIG version and a domain version) in a define-xml file (a proposed new v2.1 of Define-XML has features to support this) as if it was a standard, but, in reality, it would be a best practice guideline.
Representing domain models as recipes (or guidelines if you prefer) would have the advantage of producing domains that would already be acceptable for FDA submission (since custom domains are being accepted by FDA), and yet still being checked for consistency with controlled terminology and validation rules. And adopters of CFAST TA standards might start using these more quickly, allowing these to be field tested so they can evolve more rapidly without waiting years for a new SDTMIG version to bless them. As people learn more and gain more confidence, they can post additional examples and comments to a Wiki that can inform future user so everyone builds on everyone else’s experience over time.
Under this approach, the SDTM becomes the real model/basis for the standard, and the domain models would be treated as representative use cases. The recipe can also include incremental validation rules, but must remain faithful to any SDTM model-level rules (which would have to be promoted upward from the IG presumably). New models may need new variables in many cases, and these variables should be added to newer versions of the SDTM. But, assuming the SDS team finally adopts a more efficient manner of representing Non-Standard Variables (as they have already proposed in two drafts for comment) it would be easy enough for custom recipe-driven domains conforming to a current version of the SDTM to add each necessary new variable as an NSV at first. These NSVs could then to uplifted to become standard variables later in a new SDTM version, which would only require a simple change to an attribute in the define.xml once that new version becomes available. Either way, the new domain model can be used immediately by applying current available structures in the SDTM with some incremental new terminology (such as a new domain code describing the contents and a new controlled term for a variable name.)
This does not mean the SDTMIG goes away – it’s still needed to describe the context, general assumptions and rules, and show how to implement key SDTM concepts like the 3 general classes, trial design model, special purpose and relationship classes. But the IG wouldn’t need to cover each new variable and domain anymore and could focus on explaining core SDTM concepts with assumptions and rules instead. It could make for a leaner SDTMIG, which doesn’t have to change so often, and impart more flexibility and agility in the use of domains.
Such an approach could also be a good first step toward decoupling the SDTMIG from the highly cumbersome, restrictive use of SAS v5 Transport files, and make the model more adaptable for implementation in new technological frameworks, such as HL7 FHIR and the semantic web.
Of course this still leaves a number of challenges in terms of terminology management, but that’s a topic for another day.
In the meantime, what do you think about a simpler SDTMIG, that emphasized applying the general model with terminology to meet the needs for new CFAST domains?