Define-XML ↔ Define-JSON Bidirectional Converter

Table of Contents

Overview

Complete bidirectional conversion system between Define-XML (CDISC ODM 1.2/1.3 + Define 1.0/2.1) and Define-JSON with perfect roundtrip fidelity. Supports both legacy Define-XML v1.0 and modern Define-XML v2.1 formats with automatic namespace detection.

Features

  • Bidirectional Conversion: XML ↔ JSON with semantic equivalence
  • Perfect Roundtrip: XML → JSON → XML maintains all clinical data
  • Reference-Based Structure: Clean, non-redundant JSON with OID references
  • Dataset Specialization: ValueLists grouped by parameter (e.g., TEMP, WEIGHT)
  • Hierarchical Relationships: Domain ItemGroups reference ValueList children
  • Schema Compliant: Strict adherence to define-json.yaml schema with full Item objects
  • Comprehensive Validation: Element counts, OID preservation, relationship fidelity
  • Version Compatibility: Supports Define-XML v1.0 and v2.1 with automatic namespace detection

Quick Start

XML → JSON Conversion

from src.define_json.converters.xml_to_json import PortableDefineXMLToJSONConverter
from pathlib import Path

converter = PortableDefineXMLToJSONConverter()
json_data = converter.convert_file(
    Path('data/define-360i.xml'), 
    Path('data/define-360i.json')
)

JSON → XML Conversion

from src.define_json.converters.json_to_xml import DefineJSONToXMLConverter
from pathlib import Path

converter = DefineJSONToXMLConverter()
xml_root = converter.convert_file(
    Path('data/define-360i.json'),
    Path('data/define-360i-recreated.xml')
)

Complete Roundtrip Validation

from src.define_json.validation.roundtrip import validate_true_roundtrip
from pathlib import Path

result = validate_true_roundtrip(
    Path('data/original.xml'),
    Path('data/recreated.xml')
)
print(f"Roundtrip passed: {result['passed']}")

JSON Structure

Reference-Based Design

All ItemGroups exist at the top level with hierarchical relationships via OID references:

{
  "studyOID": "ODM.LZZT",
  "studyName": "LZZT",
  "itemGroups": [
    {
      "OID": "IG.VS",
      "name": "VS",
      "description": "Vital Signs domain dataset containing clinical data",
      "domain": "VS",
      "children": ["VL.VS.DIABP", "VL.VS.HEIGHT", "VL.VS.TEMP", "VL.VS.WEIGHT"]
    },
    {
      "OID": "VL.VS.DIABP",
      "name": "VL.VS.DIABP", 
      "description": "Dataset specialization for VS DIABP parameter containing both result values and units",
      "type": "DataSpecialization",
      "domain": "VS",
      "items": [...],
      "whereClause": [...]
    }
  ],
  "items": [...],
  "codeLists": [...],
  "whereClauses": [...]
}

Key Improvements

Dataset Specialization Pattern

  • Before: 4 ValueLists grouped by variable type (VSORRES, VSORRESU)
  • After: 14 ValueLists grouped by parameter (DIABP, HEIGHT, TEMP, etc.)
  • Benefit: Each parameter gets its own ValueList containing both result and unit items

Shared WhereClauses

  • Before: 27 variable-specific WhereClauses (WC.LB.LBORRES.AST, WC.LB.LBORRESU.AST)
  • After: 14 parameter-based WhereClauses (WC.LB.AST) with comprehensive conditions
  • Benefit: Eliminates redundancy while maintaining clinical context

Testing & Validation

Roundtrip Validation

The system performs comprehensive validation:

  1. Element Count Verification: ItemGroups, Items, CodeLists, WhereClauses
  2. OID Preservation: All identifiers maintained across conversions
  3. Relationship Fidelity: Parent-child relationships preserved
  4. Clinical Data Integrity: All metadata and clinical content intact

Expected Improvements

These changes are intentional improvements, not errors: - ValueListDef count: 4 → 14 (Dataset Specialization) - WhereClauseDef count: 27 → 28 (Shared parameter-based)

File Structure

src/define_json/
├── converters/
│   ├── xml_to_json.py          # XML → JSON conversion
│   └── json_to_xml.py          # JSON → XML conversion
├── validation/
│   ├── roundtrip.py            # Roundtrip validation
│   └── schema.py               # Schema validation
└── utils/
    └── cli.py                  # Command-line interface

CLI Usage

From Project Directory

# XML → JSON conversion
python -m define_json xml2json data/define-360i.xml data/output.json

# JSON → XML conversion  
python -m define_json json2xml data/input.json data/output.xml

# Complete roundtrip test
python -m define_json roundtrip data/input.json

# Schema validation
python -m define_json validate data/input.json

From Any Other Directory

cd /path/to/your/folder
PYTHONPATH=/Users/jeremyteoh/Projects/define-json python -c "
from src.define_json.converters.json_to_xml import DefineJSONToXMLConverter
from pathlib import Path
converter = DefineJSONToXMLConverter()
converter.convert_file(Path('input.json'), Path('output.xml'))
print('Conversion complete!')
"

Option 2: Direct Python Import

cd /path/to/your/folder
python -c "
import sys
sys.path.append('/Users/jeremyteoh/Projects/define-json')
from src.define_json.converters.json_to_xml import DefineJSONToXMLConverter
from pathlib import Path
converter = DefineJSONToXMLConverter()
converter.convert_file(Path('input.json'), Path('output.xml'))
print('Conversion complete!')
"

For XML → JSON (reverse direction):

PYTHONPATH=/Users/jeremyteoh/Projects/define-json python -c "
from src.define_json.converters.xml_to_json import PortableDefineXMLToJSONConverter
from pathlib import Path
converter = PortableDefineXMLToJSONConverter()
converter.convert_file(Path('input.xml'), Path('output.json'))
print('Conversion complete!')
"

Use Cases

Clinical Data Standards

  • CDISC Submissions: Convert Define-XML to JSON for modern APIs
  • Data Integration: Use JSON format in web applications and databases
  • Quality Assurance: Validate Define-XML through roundtrip testing

Development Workflows

  • Schema Evolution: Test schema changes with existing Define-XML files
  • API Development: Use Define-JSON as API payload format
  • Data Validation: Ensure Define-XML compliance through conversion testing

Advanced Features

Hierarchical Structure

  • Domain ItemGroups contain references to ValueList children
  • ValueLists are DataSpecialization ItemGroups with parameter-specific data
  • All relationships preserved through OID references

Clinical Context

  • WhereClause conditions include comprehensive parameter context
  • ValueLists group clinically related items (result + unit per parameter)
  • Maintains CDISC compliance while improving usability

Performance

  • Reference-based design eliminates redundancy
  • Smaller JSON files compared to nested object approach
  • Efficient querying with all ItemGroups at top level

Validation Results

360i Sample Data

  • Original XML: 98KB, 1,765 lines
  • Generated JSON: 66KB, 2,730 lines
  • Recreated XML: 66KB, 1,562 lines
  • Roundtrip Status: PASSED (0 errors, 2 expected improvements)

Element Preservation

  • 4 Domain ItemGroups → 4 Domain ItemGroups
  • 135 Items → 135 Items
  • 21 CodeLists → 21 CodeLists
  • 4 ValueLists → 14 ValueLists (Dataset Specialization)
  • 27 WhereClauses → 14 shared WhereClauses (Parameter-based)

Requirements

  • Python 3.7+
  • Standard library only (no external dependencies)
  • Optional: LinkML for schema validation

Success Metrics

  • Perfect Roundtrip: XML → JSON → XML semantic equivalence
  • Zero Data Loss: All clinical metadata preserved
  • Schema Compliance: Strict adherence to define-json.yaml
  • Clinical Usability: Dataset Specialization pattern implemented
  • Reference Integrity: All OID relationships maintained