Define-XML ↔ Define-JSON Bidirectional Converter
Table of Contents
- Overview
- Quick Start
- CLI Usage
- From Project Directory
- From Any Other Directory
- JSON Structure
- Testing & Validation
- File Structure
- Use Cases
Overview
Complete bidirectional conversion system between Define-XML (CDISC ODM 1.2/1.3 + Define 1.0/2.1) and Define-JSON with perfect roundtrip fidelity. Supports both legacy Define-XML v1.0 and modern Define-XML v2.1 formats with automatic namespace detection.
Features
- Bidirectional Conversion: XML ↔ JSON with semantic equivalence
- Perfect Roundtrip: XML → JSON → XML maintains all clinical data
- Reference-Based Structure: Clean, non-redundant JSON with OID references
- Dataset Specialization: ValueLists grouped by parameter (e.g., TEMP, WEIGHT)
- Hierarchical Relationships: Domain ItemGroups reference ValueList children
- Schema Compliant: Strict adherence to define-json.yaml schema with full Item objects
- Comprehensive Validation: Element counts, OID preservation, relationship fidelity
- Version Compatibility: Supports Define-XML v1.0 and v2.1 with automatic namespace detection
Quick Start
XML → JSON Conversion
from src.define_json.converters.xml_to_json import PortableDefineXMLToJSONConverter
from pathlib import Path
converter = PortableDefineXMLToJSONConverter()
json_data = converter.convert_file(
Path('data/define-360i.xml'),
Path('data/define-360i.json')
)
JSON → XML Conversion
from src.define_json.converters.json_to_xml import DefineJSONToXMLConverter
from pathlib import Path
converter = DefineJSONToXMLConverter()
xml_root = converter.convert_file(
Path('data/define-360i.json'),
Path('data/define-360i-recreated.xml')
)
Complete Roundtrip Validation
from src.define_json.validation.roundtrip import validate_true_roundtrip
from pathlib import Path
result = validate_true_roundtrip(
Path('data/original.xml'),
Path('data/recreated.xml')
)
print(f"Roundtrip passed: {result['passed']}")
JSON Structure
Reference-Based Design
All ItemGroups exist at the top level with hierarchical relationships via OID references:
{
"studyOID": "ODM.LZZT",
"studyName": "LZZT",
"itemGroups": [
{
"OID": "IG.VS",
"name": "VS",
"description": "Vital Signs domain dataset containing clinical data",
"domain": "VS",
"children": ["VL.VS.DIABP", "VL.VS.HEIGHT", "VL.VS.TEMP", "VL.VS.WEIGHT"]
},
{
"OID": "VL.VS.DIABP",
"name": "VL.VS.DIABP",
"description": "Dataset specialization for VS DIABP parameter containing both result values and units",
"type": "DataSpecialization",
"domain": "VS",
"items": [...],
"whereClause": [...]
}
],
"items": [...],
"codeLists": [...],
"whereClauses": [...]
}
Key Improvements
Dataset Specialization Pattern
- Before: 4 ValueLists grouped by variable type (VSORRES, VSORRESU)
- After: 14 ValueLists grouped by parameter (DIABP, HEIGHT, TEMP, etc.)
- Benefit: Each parameter gets its own ValueList containing both result and unit items
Shared WhereClauses
- Before: 27 variable-specific WhereClauses (WC.LB.LBORRES.AST, WC.LB.LBORRESU.AST)
- After: 14 parameter-based WhereClauses (WC.LB.AST) with comprehensive conditions
- Benefit: Eliminates redundancy while maintaining clinical context
Testing & Validation
Roundtrip Validation
The system performs comprehensive validation:
- Element Count Verification: ItemGroups, Items, CodeLists, WhereClauses
- OID Preservation: All identifiers maintained across conversions
- Relationship Fidelity: Parent-child relationships preserved
- Clinical Data Integrity: All metadata and clinical content intact
Expected Improvements
These changes are intentional improvements, not errors: - ValueListDef count: 4 → 14 (Dataset Specialization) - WhereClauseDef count: 27 → 28 (Shared parameter-based)
File Structure
src/define_json/
├── converters/
│ ├── xml_to_json.py # XML → JSON conversion
│ └── json_to_xml.py # JSON → XML conversion
├── validation/
│ ├── roundtrip.py # Roundtrip validation
│ └── schema.py # Schema validation
└── utils/
└── cli.py # Command-line interface
CLI Usage
From Project Directory
# XML → JSON conversion
python -m define_json xml2json data/define-360i.xml data/output.json
# JSON → XML conversion
python -m define_json json2xml data/input.json data/output.xml
# Complete roundtrip test
python -m define_json roundtrip data/input.json
# Schema validation
python -m define_json validate data/input.json
From Any Other Directory
Option 1: Using PYTHONPATH (Recommended)
cd /path/to/your/folder
PYTHONPATH=/Users/jeremyteoh/Projects/define-json python -c "
from src.define_json.converters.json_to_xml import DefineJSONToXMLConverter
from pathlib import Path
converter = DefineJSONToXMLConverter()
converter.convert_file(Path('input.json'), Path('output.xml'))
print('Conversion complete!')
"
Option 2: Direct Python Import
cd /path/to/your/folder
python -c "
import sys
sys.path.append('/Users/jeremyteoh/Projects/define-json')
from src.define_json.converters.json_to_xml import DefineJSONToXMLConverter
from pathlib import Path
converter = DefineJSONToXMLConverter()
converter.convert_file(Path('input.json'), Path('output.xml'))
print('Conversion complete!')
"
For XML → JSON (reverse direction):
PYTHONPATH=/Users/jeremyteoh/Projects/define-json python -c "
from src.define_json.converters.xml_to_json import PortableDefineXMLToJSONConverter
from pathlib import Path
converter = PortableDefineXMLToJSONConverter()
converter.convert_file(Path('input.xml'), Path('output.json'))
print('Conversion complete!')
"
Use Cases
Clinical Data Standards
- CDISC Submissions: Convert Define-XML to JSON for modern APIs
- Data Integration: Use JSON format in web applications and databases
- Quality Assurance: Validate Define-XML through roundtrip testing
Development Workflows
- Schema Evolution: Test schema changes with existing Define-XML files
- API Development: Use Define-JSON as API payload format
- Data Validation: Ensure Define-XML compliance through conversion testing
Advanced Features
Hierarchical Structure
- Domain ItemGroups contain references to ValueList children
- ValueLists are DataSpecialization ItemGroups with parameter-specific data
- All relationships preserved through OID references
Clinical Context
- WhereClause conditions include comprehensive parameter context
- ValueLists group clinically related items (result + unit per parameter)
- Maintains CDISC compliance while improving usability
Performance
- Reference-based design eliminates redundancy
- Smaller JSON files compared to nested object approach
- Efficient querying with all ItemGroups at top level
Validation Results
360i Sample Data
- Original XML: 98KB, 1,765 lines
- Generated JSON: 66KB, 2,730 lines
- Recreated XML: 66KB, 1,562 lines
- Roundtrip Status: PASSED (0 errors, 2 expected improvements)
Element Preservation
- 4 Domain ItemGroups → 4 Domain ItemGroups
- 135 Items → 135 Items
- 21 CodeLists → 21 CodeLists
- 4 ValueLists → 14 ValueLists (Dataset Specialization)
- 27 WhereClauses → 14 shared WhereClauses (Parameter-based)
Requirements
- Python 3.7+
- Standard library only (no external dependencies)
- Optional: LinkML for schema validation
Success Metrics
- Perfect Roundtrip: XML → JSON → XML semantic equivalence
- Zero Data Loss: All clinical metadata preserved
- Schema Compliance: Strict adherence to define-json.yaml
- Clinical Usability: Dataset Specialization pattern implemented
- Reference Integrity: All OID relationships maintained