Data Transparency Winter Event 2025 - Presentation Information
Save the Date for the Autumn 2025 Event! |
---|
The Data Transparency Winter Event took place from 4-6 February 2025. During this virtual event, presentations were delivered across the three days in bitesize chunks. Each day also hosted a panel discussion and Q&A session focused on the day's themes. The 12 presentations and 3 event recordings from each day linked in the event agenda below. Save the date for the Data Transparency Autumn Event will be 16-18 September! |
Day 1 – 4 February Theme: Data Privacy and Anonymisation Techniques | ||
---|---|---|
Day 1 Recording | ||
Presentation Title | Speaker(s) | Abstract |
Improve Data Quality and Explanability Based on Quantitative and Qualitative Assessments of Re-identification Risks | Louis Phillippe Sondeck, Clever Identity | Ensuring privacy while preserving data quality is the main challenge of data anonymization. Curent trends are towards quantitative approaches for risk assessment, and the one proposed by Health Canada as well as EMA Policy 0070 suggest that the threshold should be 0,09 (9%) ; meaning we should consider k >= 11 when applying the k-anonymity principle. However, if this method can reduce the risk related to patients reidentification, it comes up against 2 main issues : 1- Explainability: why 11 and why not 7 or 20? Indeed, anonymization is a fairly stressful requirement, due to the possible impacts on organizations in the event of failure ; it is therefore useful to propose explainable approaches that can reassure those in charge. 2 - Data quality : the threshold 0,09 (9%) often produces poor-quality data, which is difficult for recipients to use, due to the amount of data to be deleted We propose a new methodology that combines qualitative and quantitative assessment in order to tackle the previous issues 1 - Explainability : the methodology is based on the actual vulnerabilities of the dataset and explains how they can be exploited to re-identify the patients. The risk scenarios are created based on the identified vulnerabilities 2 - Data quality : data quality is preserved as only the vulnerable values are transformed, and not the whole dataset, as proposed by the k-anonymity principle. |
Reducing Regulatory Uncertainty Around Anonymisation: Perspectives of Canadian Privacy Regulators | Lisa Pilgram, University of Ottawa | Access to data provides a huge potential for translational medicine by enabling transparency, reproducibility, as well as innovative analyses for secondary purposes. Despite these clear benefits, data often remains inaccessible, with the most commonly cited reasons being privacy concerns. While anonymization is meant to overcome these concerns, the interpretation and implementation of anonymization varies across jurisdictions, partially because of variations in regulator interpretations and perspectives. We performed an interview study with 93% of all Canadian federal, provincial and territorial privacy regulators with the aim of (1) understanding the regulators’ perspective on anonymization and anonymized data, and (2) identifying potential obstacles to adopting anonymization. The findings identified where there was a lack of consistency and certainty across the Canadian regulatory landscape. Based on their perspectives, we developed explicit recommendations to reduce uncertainty and to create incentives for using and disclosing anonymized data. |
Data Anonymisation Techniques and Experiences | Anju Bhatia & Swetha Ram, Novo Nordisk | The world is embracing anonymization, as the adoption of anonymization methodologies enhances both data utility and readability of clinical trial documents while protecting the privacy. Anonymisation involves transforming data to prevent identification of individuals, making identification unlikely. There are two key anonymization methodologies: Quantitative and Qualitative. The trial participant’s data including direct and indirect identifiers, can be anonymized through different techniques like redaction, pseudonymization, or randomization, generalization, offsetting and suppression. Novo Nordisk’s practical experiences include navigating EMA policy 0070, HC-PRCI and joint EMA-HC submissions, utilizing new EMA policy 0070 anonymization report template, and overcoming various challenges and bringing in efficiency through multiple submissions. These experiences have provided us some valuable insights to the selection of anonymization approaches, managing process initiation meeting (PIM)/pre-submission meeting to get agreement on submission anonymization approach, categorization of identifiers, ensuring quality and consistency, and being compliant. |
The Emperor's New Clothes: Unveiling the Risks of Anonymisation in Clinical Trial Data | Lora Killian, Pfizer | In the tale of "The Emperor's New Clothes," the truth is hidden in plain sight, much like the realities of anonymization. This presentation will introduce the concept of EFPIA’s Anonymization Gradient as a vehicle to explore the practical limitations of anonymization. We will highlight misperceptions surrounding clinical trial data anonymization that are often not acknowledged or understood. As the pharmaceutical industry faces increased disclosure requirements, we will examine the need to advocate for a more functional understanding of clinical trial data anonymization globally so that industry can continue to deliver on the increasing disclosure demands while still appropriately protecting participants. |
Day 2 – 5 February Theme: Transparency and Accessibility in Clinical Research | ||
---|---|---|
Day 2 Recording | ||
Presentation Title | Speaker(s) | Abstract |
Clinical Trial Results at CTIS: Analysis of Compliance with Annex IV of EU CTR 536/2014 | Shalini Dwivedi, Krystelis | The Clinical Trials Information System (CTIS) is the central platform for sharing clinical trial data for studies conducted in the European Union, aligning with the principles of the EU CTR 536/2014 and the broader mission to enhance clinical trial transparency. Annex IV of the regulation guides sponsors on the clinical trial results disclosure information. However, a corresponding results disclosure template has not been released by EMA. Our study aims to systematically analyse the availability, completeness, and quality of trial results currently available in CTIS. Examining trials published between January 2023 and December 2024, we assess studies with uploaded results, the timeliness of these submissions, and their compliance with Annex IV requirements. Key findings highlight the compliance of results from academia and industry-sponsored trials with the Annex IV requirements, variations in format and information, and factors influencing reporting timelines. Additionally, we identify common gaps in reported data and opportunities for further improvements. |
Collaborative AI Networks for Global Clinical Trial Transparency Operations | Woo Song, Xogene | This presentation explores how interconnected AI systems can work together to streamline clinical trial transparency activities across global requirements. We will examine how multiple specialized AI agents can collaborate to manage complex transparency workflows while maintaining human oversight. * Automated extraction and harmonization of trial information across multiple registries Through examples and demonstrations, we'll illustrate how these systems can reduce manual effort while maintaining quality standards. We'll also address practical considerations around implementation, validation, and the evolving partnership between AI systems and human experts. |
Strategies for Enhancing the Accessibility of ClinicalTrials.gov | Samantha Toscano and Megan Larkin, Biogen | The current public information on ClinicalTrials.gov falls (CTgov) short in meeting the needs of patients and caregivers seeking treatments. Less than 1% of the ~470K trials on CTgov contain a 'brief summary' written at or below the average 8th grade reading level of Americans. And the data, as registered by researchers and sponsors, is often inconsistent and non-standardized with limited searchability. In May 2024, a Biogen Data Jam was held to explore how CTgov could better connect patients with clinical trials through improved data usage. In this presentation we will present the insights from the Biogen Data Jam as tangible changes that sponsors and the NIH could adopt that address a critical unmet need for patients and caregivers looking to make better informed decisions about their healthcare. At a high-level these include recommendations include: • Standardizing Vocabulary: Implementing a consistent vocabulary for sponsors to enhance searchability. • Using patient-friendly plain language: Introducing readability checks and translating complex study information into more understandable language. • Improving Navigation and Functionality: Enhancing the site’s user interface to make it more navigable, inclusive, and functional. It is essential for the NIH, sponsors, and the industry to take accountability and act on these recommendations that promote equitable health access and representation in clinical trials. |
Considerations for Using AI to Create Lay Summaries of Trial Results | Kimbra Edwards, CISCRP and Julie Holtzople, Holtzople Consulting | Lay summaries (LS) are essential for making clinical research results more transparent and accessible to non-scientists, addressing the traditional barrier of complex scientific language. Artificial Intelligence (AI) has the potential to streamline the drafting of LS, saving time and resources. However, overreliance on AI to generate these summaries without appropriate human oversight can lead to inaccuracies or misinterpretations. In this presentation, representatives of a multi-stakeholder working group that has drafted a good practice considerations document on the responsible use of AI to draft LS will discuss the background of the initiative, the purpose and goal of the document and the ongoing public comment period, and raise some of the key considerations. Technical aspects of AI development and management are not covered. |
Day 3 – 6 February Theme: Data Sharing, Compliance, and Synthetic Data | ||
---|---|---|
Day 3 Recording | ||
Presentation Title | Speaker(s) | Abstract |
Medical Imaging Data Sharing and De-Identification: The BioData Catalyst® Data Management Core Experience | Zixin Nie, RTI International | Preparing imaging data to share with the research community can be challenging because it may be stored in older and non-standard formats, have issues with de-identification, and research teams may not have the resources to mitigate these issues. In our role at the Data Management Core (DMC) for NHLBI BioData Catalyst® (BDC) – a cloud-based ecosystem that hosts data from clinical studies and offers researchers tools, applications, and workflows – we provide support to NHLBI researchers as they prepare their data for sharing and analysis on BDC. We have evaluated various proprietary and open-source solutions and created some tools using open-source packages in Python. We will demonstrate the capabilities of these tools, showing how one can use Python to process and de-identify medical imaging data, creating an automated pipeline that can be shared and deployed on-premises to overcome many issues researchers face with data sharing. |
Ensuring Privacy with Meaningful Synthetic Data for RAVE EDC | Jaskaran Singh Saini, GENINVO | The collection and utilization of Real World (RW) data faces many challenges, including patient privacy concerns, compliance with regulatory bodies, and restrictions on further data use. Synthetic Data emerges as a solution to these challenges. The expansion of AI/ML models in creating Synthetic Data that mimics the statistical properties of RW data offers a significant advantage in reducing the clinical trial timeline. However, traditional AI/ML models require RW data for training and validation, which can lead to privacy violations through potential re-identification of individuals. This presentation focuses on creating synthetic demographic data that has no linkage to RW attributes such as age, gender, ethnicity, and nationality. Furthermore, it delves into the creation of synthetic data for a Clinical trial within the RAVE EDC framework, establishing dependencies in Medical Conditions like Adverse Events, Medical History, Drugs Administered, Arms Linkage, Concomitant Medications, and Vital Signs. |
AI in Action: Transforming Data Privacy and Compliance in the Pharmaceutical Industry | Zach Weingarden, TrialAssure | Data privacy is critical to building trust and driving innovation in today’s global health landscape. With the increasing demand for transparency, the pharmaceutical industry faces unique challenges in safeguarding sensitive information while accelerating research and compliance processes. This session will delve into how AI-driven tools are revolutionising data privacy and compliance, focusing on practical solutions for complex challenges. Using real-world examples, we will discuss use cases, techniques and other strategic considerations when implementing language model-based solutions. Attendees will learn how these AI-powered solutions streamline workflows, uphold the highest standards of privacy, compliance, and transparency, and drive innovation. Key highlights include:
|
Content Analysis of Free-Text Patient Data in CSRs to Assess Automated Anonymisation Approaches | Nastazja Laskowski, University of Manchester in Collaboration With Roche | Introduction: De-identification of unstructured free-text data is important for sharing the increasing amount of healthcare data generated by electronic healthcare records, publications and clinical trials. To automate the process, information extraction (IE) and natural language processing (NLP) are necessary. The evaluation of the performance of NLP in the context of de-identification depends on which types of entities appear and how important they are for re-identification. Method: This study uses a mixed-method approach by calculating entity frequencies, characterising roadblocks in NLP and developing a framework for assessing the impact of different entity types. Sixteen patient narratives from clinical study reports (CSRs) from Roche were analysed. Results: We found the weighting of different entities for evaluating NLP performance in de-identification would depend on the background knowledge of the intruder; however, commonalities are found across all intruder types. Psychiatric clinical studies posed the most challenges for NLP due to the complex semantic characteristics of the free text. Conclusion: Current NLP evaluation metrics don’t translate well to the task of de-identification. Some therapeutic areas push the capabilities of NLP to its limits more than others (e.g. psychiatry). More research is needed to develop robust mathematical equations for the evaluation of NLP in the specific context of de-identification tasks. |
Sponsorship |
---|
Hosting the Data Transparency Event digitally means that no matter where you are in the world you can participate. It provides the industry with a broader opportunity to share knowledge on a global scale, connecting through the virtual event platform. The sponsor options offer a range of benefits with ample company exposure. See the prospectus for more detail. |
Data Transparency Working Group Leads |
---|