Atlassian uses cookies to improve your browsing experience, perform analytics and research, and conduct advertising. Accept all cookies to indicate that you agree to our use of cookies on your device. Atlassian cookies and tracking notice, (opens new window)
Encapsulated Data - implies that the transmitted data is manipulated in a non-destructive manner with the necessary metadata in a header of a given package for sender and receiver to understand and process.
Data Provenance - refers to the ability to trace and verify the origin of data, as well as how and by what systems it has been altered since its origination.
Data File Format
File Size
Definition: For a given dataset the comparative size of the transport package (in bytes)
Compression
Definition: Does the transport format support compression and decompression of encapsulated data using standard open compression formats?
Encryption
Definition: Does the transport format support the encryption of encapsulated data using industry standard algorithms (including PKI)?
Digital Signature
Definition: The transport format will support the application of one or more digital signatures on encapsulated dataset
Data integrity
Definition: The transport format will support a hash or checksum function to mitigate unexpected data changes
Schema driven
Definition: The transport format should support a schema to ensure that a data transport file will be well-formed and valid.
Well defined Metadata
Definition: The transport format will support a set of well-defined metadata tags that allow effective communication of encapsulated data between sender and receiver.
Examples: Encryption used, record number, subject UUID, etc.
A use case for this would be partitioning study datasets for a into subject transfers and having enough metadata to reconstitute the original study
Sending partial datasets for a subject
Incremental or cumulative data transfers
Wide Payload Support
Definition: The transport format may support transfer of a wide range of well-defined payloads over and above data currently well-described using tabular data structures.
Definition: The transport format will support meaningful relationships between data.
Example: Replace RELREC with metadata laden links for relationship between clinical observations and histopathology findings.
Partial Data Transfers
Definition: The transfer format should support the transmission of subsets of data in a meaningful fashion.
Examples: (this should be linked with the well defined metadata)
Transmitting data on a subject level
Transmitting all data for a given time period across multiple subjects on request
Transmitting incremental datasets
Must be an Open Standard
Definition: The full transport format specification is freely available, well documented and allowed for free use without license. All supporting materials (eg schemas, documents) will be available without cost.
Should support multibyte character encodings
Definition: The transport format supports the fidelity of captured source data in transmission without requiring translation or transcoding. The encoding of a transport file should be declared by the format. Restrict support to UTF-8 encoding.
Example:
Should support submissions in Kanji for Japanese Studies
Audit records
Definition: The transport format should support the transport of audit data/metadata.
Example:
The CRF-level audit trail should be able to be transported as part of an end-to-end submission
Something similar to the capability present in the ODM
Traceability and Provenance
Definition: The transport format should support the transport of traceability data and metadata to establish data provenance.
Example:
For a given data value in a submission analysis dataset it will be possible to trace back to the original source of data.
Transmit data and metadata
Definition: It will be possible to transfer both data and metadata in the same transport file.
Example:
In a given data transfer incorporate both the metadata and data, and link from data elements to corresponding metadata
Value
Costs of adoption
Definition: The transport format should represent a net positive return on investment for adoption
Resource costs - cognitive load for personnel
Definition: The transport format should be sufficiently familiar to not require large costs of training and utilisation
Example:
The transport format should support transport of tabular datasets
The transport format should be simple to build (e.g. PROC ALTRANS)
Resource costs - storage/transport
Definition: The choice of the new transport format should not incur large increases in costs for processing, sending and storing data held in the format.
Example:
An substantial increase in file size would increase costs of hard drive space and bandwidth for transmitting.
Complex encryption mechanisms might incur a processing cost for unencrypting at each stage of data creation and review
Resource costs - software
Definition: The adoption of the alternative transport format will not require a large capital outlay for software to build and manipulate the data format. It should be supportable using existing data management systems
Examples:
PROC_XPORT -> PROC_XPT++
Compatible with ODM systems (e.g. XML based or similar)
Cost of Format adoption for generation and processing of clinical data.
Definition: The time taken to get a submission to regulators and for regulators to be able to initiate and complete review should not be impacted by the adoption of the new transport format.
Example:
There will be a minimal cost in time for generation of data in the new format, relative to the existing standard.
Value of adoption of new transport format
Definition: Time to review of submission should decrease because of better expressivity and improved quality of datasets
Example:
The format should support self-validation for identification of common submission issues
Time spent recreating full context datasets should decrease
Validation of capability of new format
Definition: Tools exist that are capable of validating the content of transport files against CDISC implementation guide rules. These rules include data format standard rules and data domain context rules. Any new transport format would need tools to product similar validation.
Example:
Value not found in non-extensible code list
Missing data for --STRESC when --ORRES is provided.
Content
Definition: Changes to the content model that will deliver benefits for adopters.
Able to represent relationships in the data without requiring duplication within a single data transfer
Definition: The ability to indicate relationships between elements within a encapsulated dataset. The relationship should also be able to be annotated (e.g. reason for ascribing relationship)
Examples:
Represent the causality for a given concomitant medication with respective to one or more adverse events (and vice versa)
Actions taken on Adverse event, for example hospitalisation
Findings About about an result or intervention related to the observations incurred
Refine model to avoid duplication of data, context, metadata
Able to represent relationships to external resources
Definition: It will be possible to link encapsulated content to external resources such as standard controlled terminology
Examples:
Link to Controlled Terminology using resource URI
Tabular Data Representation
Definition: Encapsulated content should support tabular representations of data
Example:
It will be possible to represent legacy datasets.
Transform data into tabular data structure.
No field width restrictions
Definition: The transport format will support arbitrary width fields. Format should allow declaration of width for the purposes of content validation.
Example:
Data should not need to be truncated for transport
Data should only occupy as much space as needed (not fixed width)
More discrete datatype definitions
Definition: Transfer format will support additional datatypes than existing than CHAR/NUM, eg XML Schema Definitions.
Examples:
Date
Time
Datetime
Datetime with timezone
Integer
Float
Bool
Transactional Data Model
Definition: The content model will support the expression of transactional data for a data submission if requested.
Examples:
Reflect changes to data to reflect findings of a data safety monitoring board
null Flavour support
Definition: The content model should support something similar to the null flavour in ISO21090 datetypes
Examples:
A missing value should have a qualifier to indicate reason for absences (eg not given, refused)
This is currently absent from the SDTM model
Compatibility/Extensibility
Backward compatibility
Definition: New transport format will be capable of being transformed to and from existing transport format
Examples:
Decompose defined data types to CHAR/NUM
Truncate variable length fields to fixed length fields and SUPPQUAL
Translate discrete relationships to RELREC where possible
Note that this would not accommodate loss through UTF-8 -> US ASCII
Compatibility with existing Health data standards
Definition: It should be be compatible with existing standard healthcare formats
Examples:
Transform to and from ODM (including dataset-XML)
Transform to and from HL7 C-CDA
Transform to and from BIMO (?)
Projected Lifespan of Standard Support
Definition: The transport format should be supported by a non-commercial industry body with a mandate for a minimum length of time of full support for the transport format. This may depend on the age of the existing standard
Example:
Consider CDA vs FHIR, will both standards continue to exist in active development or will one supplant the other? Will the standard owner continue to maintain support and development?
Extensibility
Definition: It should be able to accommodate new content requirements easily, cost-effectively, and retain backwards compatibility (i.e. no or minimal need to modify data management tools or processes). This implies support for namespaces.
Example:
Addition of custom attributes peculiar to a system adopting the standard
Systems naive to an extension will not be affected by use of extension