Define-JSON employs a copy-and-link versioning model rather than traditional inheritance hierarchies. This architectural choice has profound implications for clinical data management, regulatory compliance, and system maintainability.

Instead of inheriting from parent versions, each MetaDataVersion is a complete, self-contained snapshot that links back to its sources via wasDerivedFrom relationships:

                    ┌──wasDerivedFrom── CDISC Standard v1.0
                    │
Study ABC v1.3 ─────┼──wasDerivedFrom── Study Template v2.1
                    │
                    └──wasDerivedFrom── Local Extensions

Each version contains complete copies of all relevant definitions, creating an immutable snapshot with explicit provenance chains.

Why This Matters for Clinical Data

Regulatory Compliance

  • Immutable Audit Trail: Once created, versions never change - essential for regulatory submissions
  • Complete Context: Each submission is self-contained with no external dependencies
  • Clear Lineage: wasDerivedFrom provides unambiguous provenance for regulatory review

Data Integrity

  • No Cascade Failures: Changes to standards don't break existing studies
  • Temporal Consistency: Historical analyses remain reproducible indefinitely
  • Explicit Customisation: Study-specific modifications are clearly documented

Operational Benefits

  • Simplified Queries: All metadata is local - no complex inheritance resolution
  • Parallel Development: Teams can work independently without version conflicts
  • Predictable Performance: No deep inheritance chains to traverse

Trade-offs Considered

Storage vs Simplicity

  • Cost: More storage required for duplicated definitions
  • Benefit: Eliminates complex inheritance logic and cascade dependencies
  • Reality: Storage is cheap; developer time and regulatory risk are expensive

Propagation vs Isolation

  • Challenge: Updates must be explicitly propagated to derived versions
  • Benefit: Prevents unintended changes from affecting production systems
  • Mitigation: Tooling can assist with selective update propagation

Divergence vs Control

  • Risk: Versions may diverge over time without careful governance
  • Benefit: Explicit control over what changes are adopted and when
  • Practice: Regular review cycles ensure alignment where needed

Implementation Pattern

This mirrors successful patterns in other domains:

  • Git: Complete snapshots with parent pointers, not deltas
  • Docker: Immutable layers with explicit inheritance
  • Blockchain: Immutable blocks with cryptographic lineage

For clinical data, where "reproducible forever" is not just good practice but regulatory requirement, this architecture provides the foundation for trustworthy, auditable data systems.