PHUSE CSS 2025
PHUSE CSS 2025 |
---|
In its 14th year, the PHUSE Computational Science Symposium (CSS) event will be in Utrecht, the Netherlands 20-21 May. At the heart of the event is the PHUSE Working Groups’ mission to tackle unmet computational science needs in support of health product development and regulatory review, ultimately bringing safe and effective medical products to those who need them. The CSS agenda will feature a variety of activities, including expert-led plenary sessions, interactive workshops and dedicated Working Group Breakout Sessions that drive forward existing initiatives and explore new ones. Attendees will also have the chance to attend the Poster Session, providing the opportunity to share knowledge and engage in discussions on various computational science topics. PHUSE is excited to welcome attendees from across pharma, CROs, academia, health authorities, technology vendors and SDOs to come together for this collaborative and forward-thinking experience. |
Registration is Open! |
---|
Registration for the event in Utrecht is open! |
CSS Poster Session - 20 May | ||
Poster Title | Poster Abstract | Company |
---|---|---|
The Importance of Information Extraction from Unstructured Clinical Data in Pharmacoepidemiology | The FDA Sentinel Innovation Center MOSAIC-NLP (Multi-source Observational Safety study for Advanced Information Classification using NLP) project analysed electronic health record (EHR) data linked to claims to study the association of montelukast with neuropsychiatric events in patients with asthma. The study analysed data from ~17 million notes from a cohort of 109,076 patients from 112 health systems in Oracle real-world data. Information extracted from unstructured clinical notes used medical language models from John Snow Labs to enhance outcome identification and confounder control. The study found that neuropsychiatric events may be undercounted, using only structured data from EHR and claims undercounted, as the number of observed suicidality/self-harm events doubled with the addition of unstructured EHR data. Furthermore, events such as irritability, agitation and memory problems were only detected in unstructured data. This study illustrates the importance of unstructured data especially related to mental health outcomes. | Oracle Health Sciences |
SEND (Standard for Exchange of Nonclinical Data) Industry Feedback Survey: 2025 Results | As the scope of SEND continues to expand, sponsors and vendors must continuously evolve to keep pace with updated regulatory specifications. The 2025 Industry Feedback Survey will provide a snapshot of the SEND industry, focusing on three main topics: Implementation of New Standards, Virtual Control Groups and Dataset-JSON. The responses received will be analysed to report any result trends seen across the industry. PHUSE Working Groups will use the summarised results of this survey to examine possible initiatives and discussions for effective SEND standard operations. | Inotiv & Labcorp |
The Recipe for Optimising Efficiency in Clinical Operations | Clinical operations face inefficiencies from fragmented data, subjective decision-making (e.g. selecting sites without considering performance data or collecting data unrelated to endpoints), suboptimal enrolment strategies, inconsistent protocol adherence, inadequate resource allocation, and deficient monitoring. These challenges lead to higher costs, delays, and increased risk of trial failure. Integrating knowledge graphs, simulation, optimisation and AI helps address these challenges by connecting and analysing relevant data to streamline decision-making and operations. Knowledge graphs integrate data about site performance, patient demographics, recruitment, past trials, regulations, protocols, and more to provide a comprehensive view that enhances resource allocation and trial processes. Simulation model trial scenarios allow for prediction of risks and outcomes. Optimisation techniques, coupled with AI, improve site selection, align data collection with study endpoints, and ensure more efficient trial execution. This data-driven integrated approach reduces inefficiencies, accelerates trial timelines and lowers costs, improving overall operational efficiency in clinical development. | Altair |
GitLab vs GitHub: Battle of the Repositories | Explore the world of repository management with GitLab vs GitHub: Battle of the Repositories. When I first started in the industry, GitHub was widely used at Roche, but shifting to GitLab sparked interest. This poster will delve into the differences and capabilities of both tools. We will look at key distinctions in project management tools such as issue tracking, project boards, issue weights, burn-down charts and collaboration. We will provide a detailed comparison, showcasing which offers a more comprehensive solution for project management, appealing to those requiring granular detail. Additionally, we will evaluate CI/CD implementations and scalability. While both platforms efficiently support continuous integration pipelines, our analysis highlights which might be better suited to specific needs. The poster will also include some of the basic commands to get you started, as well as hints and tips! | Roche |
Pimp Your Data Validation with CDISC Open Rules | Delivering high-quality data, while ensuring compliance with all conformance rules and regulatory requirements, can be challenging. In addition, achieving consistency and accuracy across submissions requires the industry to adopt a unified approach. The CDISC Open Rules Project meets this requirement by delivering executable conformance rules for each foundational standard. Next to this, CDISC has developed an open-source, industry-standard method for rule creation and data validation, offering a very promising prospect. Our team is implementing this method in house to create rules that enhance and extend the standard conformance rules. This poster will highlight how custom rules can be built and integrated into every company’s data validation process. Additionally, it will showcase its potential in creating data cleaning rules and validating non-CDISC data. This uniform and transparent method will facilitate the transfer of custom rules between all stakeholders (CROs, pharma, CDISC, regulatory agencies, etc.), thus enhancing collaboration and standardisation across the industry. | SGS Health Science |
Mastering the Jungle of Real-World Data | Real-world data (RWD) is increasingly used in regulatory submissions in the healthcare and pharmaceutical industries. Although as an industry we see that the value of this data may be huge for improved insights and decreased timelines in the development of new drugs, many attempts in efficiently using real-world data for regulatory submissions fail due to the complexity and unfamiliarity with this data and its features. This poster will present a number of issues related to real-world data use for regulatory purposes and show how we enable the data lineage from real-world data source to submission-ready datasets as well as how we take care of information needed for traceability and trust in the data. | ClinLine |
Collaborative Industry Open-Source Solutions for Clinical Trial Quality Management | Recent ICH guidelines require a risk-based strategy for the quality management of clinical trials in both quality control and quality assurance. Risk management frameworks such as risk-based quality management (RBQM) provide guidance on how risk-based approaches can be implemented and require statistical oversight of operational and clinical trial data. The IMPALA (Inter Company Quality Analytics) Consortium and the PHUSE Open RBQM Working Group project are cross-industry collaborations that promote open-source development of statistical tools and frameworks for clinical trial quality management. The good statistical monitoring {gsm} R package provides a framework for data transformation, statistical analysis and reporting. {simaerep} and the gsm extension {gsm.simaerep} use a non-parametric approach for flagging site outliers for the reporting of subject-level clinical events (e.g. adverse events, protocol deviations). Statistical outliers can be detected using the clinical trial anomaly spotter {ctas} package. We aim to extend this suite of tools. | Roche |
Automating Clinical Test Data Generation and Entry with Generative AI and Innovative Technology | Entering test data into electronic data capture (EDC) systems, which is necessary for testing any study design and numerous downstream systems and programs, is a time-consuming and resource-intensive process when done manually. In addition to EDC data, acquiring test data from external vendors is often delayed, causing strain on study planning. To address these issues, Takeda has developed two innovative software solutions using open-source technology – specifically Python – and artificial intelligence. Firstly, the Test Data Entry Application is used for generating and entering complete test subject data into Takeda’s primary EDC platform. Secondly, the Synthetic Data Generator is a GenAI application designed to create test datasets for any external vendor who has provided data specifications. These solutions have significantly improved the efficiency of processing clinical trial data by reducing manual effort and accelerating study set-up timelines. | Takeda |
Anonymisation of Unstructured Clinical Data: Patient Narratives as a Case Study | Introduction: Anonymising unstructured clinical data combines natural language processing (NLP) with statistical disclosure control (SDC) to extract re-identifying or disclosive entities. Together, this ensures privacy while maintaining utility of data. This research evaluates barriers, facilitators and important considerations going forward. Methods: This project adopts a mixed-methods approach, by integrating interviews to capture expert insights, content analysis to uncover patterns and richness in the data, and quantitative analysis to evaluate frequency of disclosive entities in patient safety narratives from clinical study reports. Results: We identify several challenges, including poor applicability of metrics to the task of anonymising unstructured data, complexity of the intruder’s background knowledge, limitations in NLP such as processing the semantics of text, and considerations regarding branching when using MedDRA for generalisation. | Roche |
AI-Driven Lab Data Management: Standardising and Digitising Clinical Site Data for Accuracy and Efficiency | Lab data management in clinical trials is complex, decentralised and unstandardised – often involving unstructured files with handwritten elements, and low-quality documents submitted via fax – which leads to error-prone data entry processes. To address this, we are developing a scalable solution to centralise lab data. We employ open-source AI for optical character recognition (OCR) to digitise these challenging documents, Generative AI for data processing, and large language models (LLMs) for semantic mapping of lab data to study data, to create a centralised database. This solution will streamline data digitisation and standardisation across 3,300+ global labs, significantly reducing manual effort from ExBPs and human error, while enhancing the efficiency of populating electronic data capture (EDC) systems. Our poster will showcase the project architecture, source code snippets and key methodologies used. Using AI and computer vision, we aim to revolutionise lab data management in clinical trials by improving efficiency, accuracy and data integrity. | Roche |
Rare Disease Data – Overcoming Barriers to Controlled Access Sharing | Rare disease data sharing to qualified researchers on controlled access platforms brings hope for accelerated identification of molecular mechanisms/traits underpinning rare disease and development of new therapies. Rare disease clinical trial data may be more difficult to anonymise while maintaining utility compared to other clinical trial data, thus are not routinely shared. Considering the term ‘rare diseases’ covers a spectrum regarding prevalence, sensitivity of the data associated with phenotypic manifestations, stigma, and complexity of clinical data sharing, it is difficult to develop a one-size-fits-all procedure for sharing rare disease data. The PHUSE Rare Disease Clinical Data Sharing Working Group project has developed a white paper to review potential barriers to data sharing, e.g. risk of re-identification and invasion of privacy, and to provide recommendations for encouraging the sharing of rare disease data with the research community. This poster will provide an update on the progress made for this initiative and the current status. | Roche & AstraZeneca |
CDISC USDM and ICH M11: Practical Electronic Protocols | In the next few months, industry will see the release of both the Unified Study Definitions Model (USDM) and ICH’s M11 Clinical electronic Structured Harmonised Protocol standard (CeSHarP). These two standards, when combined, will have a fundamental impact on the way sponsor companies operate, allowing sponsors to move away from a siloed way of working to a data-centric one focused on a single source of truth. This poster will illustrate how the USDM and M11 CeSHarP come together, and detail the practical work undertaken to demonstrate the combined power of these two standards, including: The ability to automate downstream processes including SDTM and laboratory data The work taking place in the HL7 Vulcan UDP project and the associated Connectathons for developing the FHIR M11 and SoA messages The work on the FDA PRISM pilot, sending eProtocols to the FDA’s precisionFDA cloud environment using the FHIR M11 message. | data4knowledge |
An eSubmission Jigsaw When the Answer is Not in a CDISC IG | The CDISC models and implementation guides (IGs) are invaluable resources when it comes to preparing data and associated documentation for regulatory submission. However, science advances, the complexity of studies increases, and a specific data modelling need for a current study may not be addressed in the published CDISC standards (yet). This poster will explore potential resources for guidance and show some of today’s common use cases. It will discuss strategies for integrating and documenting non-standard components so that the final SDTM and ADaM eSubmission data packages provide the full picture of the study, where all the puzzle pieces fit together nicely. | mainanalytics |
Scoring Real-World Data Reliability for Clinical Investigations | Real-world data (RWD) is often unstructured and inconsistent. Assessing its reliability is necessary before use in clinical investigations. Several methods measure RWD reliability: conformance checks determine whether variables match expected formats, lengths and values; completeness assessments quantify missing data and track accrual patterns over time; and data linkages detect orphan records and inconsistencies in relational data. Scores are assigned to RWD sources based on these measures of reliability. This poster will present an approach to applying these assessments to RWD datasets and producing structured outputs for decision-making. It will show procedures can produce audit trails, automate assessments and generate reports for comparing RWD sources and determining their suitability for regulatory submissions. This work is based on the Structured Process to Identify Fit-for-Purpose Study Design and Data (SPIFD2) framework and insights from FDA/Duke-Margolis workshops on real-world evidence. | EDA Clinical |
Metadata-Driven TFL Generation: A New Approach to Implementing Efficient and Automated TFL in Clinical Trials | In the evolving landscape of statistical analyses of clinical trial data, efficient and accurate reporting is essential. Generating tables, figures and listings (TFLs) used to be a tedious task, from designing TFL shells in traditional static Word documents to writing algorithms to generate those TFLs. This poster will introduce a metadata-driven solution to generate TFLs for clinical study reports, transitioning from static documents to a dynamic application. This innovative approach will enable users to design TFL shells and populate TFL metadata, modernising statistical analyses production. It will improve consistency, reinforce the use of standards, reduce manual error, facilitate content reuse and accelerate the reporting process. We shall introduce TFL automation and analysis results dataset (ARD) components to increase transparency on generated statistics. This poster will detail the methodology, benefits and practical applications of the metadata-driven approach, as well as the challenges we face in the implementation. | Sanofi |
Generating Synthetic eCOA (Non-CRF) Data from External Vendors Using Python | In clinical research, integrating non-case report form (non-CRF) data from external vendors, such as electronic clinical outcome assessment (eCOA) data, presents significant challenges. These challenges include ensuring data privacy, maintaining data integrity, missing data handling, non-standardisation in data transfer agreement (DTA) and data transfer specification (DTS) files from vendors, and achieving interoperability with clinical trial data. This poster will demonstrate creating synthetic eCOA non-CRF data, specifically in the QS (Questionnaires) domain, using Python packages (Pandas, NumPy, etc.). The process also incorporates EDC (electronic data capture) data such as DM (Demographic Data) and SV (Scheduled of Visit) forms and creates a standard DTA (Data Transfer Agreement) template for generating meaningful synthetic data which mirrors the real-world data from external vendors. This process makes creating synthetic data for domains such as FA, RS and LB feasible with multiple use cases. | GENINVO |
Representing Virtual Control Animal Data in SEND | The PHUSE Nonclinical Topics Working Group project Supporting the Use of SEND for the Implementation of Virtual Control Groups proposes a method for including virtual control data as part of a study’s SEND dataset package. The proposal includes suggestions for incorporating trial design information for experimental designs that use virtual control groups (VCGs) and include individual virtual control animal (VCA) data. VCG can be incorporated into the trial design using trial sets; each trial set should contain only virtual animals or concurrent animals. Trial set parameters should identify that the trial set includes VCA and describe the selection criteria for the VCA. Each VCA’s experimental conditions can be described using the Subject Characteristic domain. Individual VCA data can be included in interventions, findings and special purpose domains, and specific guidelines are included for these domains (e.g. how to represent dosing events in EX). | Merck |
The VICT3R DB: A Repository of Curated Control Animal Data To Assist in Establishing Virtual Controls | The virtual control group (VCG) concept aims to partially or fully replace concurrent control group animals in in vivo toxicity studies, using data from a well-curated database of standardised control data to reduce animal use in line with the 3Rs principles. This poster will focus on the data curation workflows applied and present use cases of the VCG approach. | Merck |
Automated Quality Control for Safety Aggregate Reporting | Safety aggregate reporting is crucial in pharmacovigilance, for ensuring patient safety through annual reports such as PBRERs, DSURs and ACOs, which provide insights into drug safety and aid regulatory decisions. The accuracy of these reports is vital. Quality control (QC) ensures integrity by detecting errors early, thereby improving clinical trial safety analyses and regulatory compliance. Automation and digitalisation can transform QC by using a single macro to streamline the process, implement validation rules and detect discrepancies early. Our innovative automation enhances QC by programmatically comparing datasets with previous years’ reports, and by flagging any data particularities or discrepancies. It is standardised for all safety aggregate reporting, resulting in improved data analyses and enhanced programming efficiency in the clinical drug development and review process. The automation framework reduces manual checks and ensures scalability, consistency, high data quality and optimal resource use, advancing the effectiveness and compliance of pharmacovigilance operations. | Pfizer |
Map Old to New: Automating Conversion from Proprietary Data Format to CDISC Standards | Study data collected and analysed prior to the emergence of CDISC was not standardised across companies and involved different formats. Programming tables for safety aggregate reports (DSURs, PBRERs, ACOs) submitted to regulatory agencies created the need to harmonise the heterogenous data types of newer and older studies, including those completed years ago, prior to the adoption of CDISC, while ensuring high quality is upheld during data pooling. Our team designed a program to automatically map older studies’ completed analysis datasets to CDISC (ADSL). Key variables for demography tables were mapped directly to ADSL, and the program accounted for parent/extension studies to avoid double-counting participants. This process allowed for a standardised approach to aggregate older study data, formerly in proprietary data format, with newer study data in CDISC. Aggregated study data in safety reports is produced more efficiently and complies with current regulatory standards, with reduced programming time. | Pfizer |
CSS Chairs | ||
---|---|---|