Tools
To support the ultimate goal of NextGen to create a small-scale European Health Data Space (EHDS) with its full data integration, processing and sharing functions, including a governance framework, the Project will implement a Pathfinder to help researchers to connect efficiently and securely with different datasets across the globe and execute specific analyses in a federated manner. Integrated in the Pathfinder, several real-world pilots will demonstrate the effectiveness of NextGen tools and help connecting and joining together the five collaborating clinical sites as a self-contained data ecosystem and comprehensive proof of concept.
The pilots will demonstrate the advanced integration and workflow tools in data use cases displaying the removal of technical and operational barriers. The pilots will then be integrated into the “NextGen Pathfinder”: a multi-site “mini-EHDS” network showcasing NextGen innovations in data management, data governance, cataloguing, compute, advanced data integration, genomic and interoperability capacities.
There will be six 6 pilot demonstrations; 5 sites included in Pathfinder and at least 1 successful public Pathfinder demonstration; regulatory, governance and data tooling demonstrated for 7 countries (SE,UK,CH,FI,USA,DE,NL).
Data curation in the genomics space requires a complex, integrated processing pipeline. Genetic association studies may encompass millions of genomic variants and stringent quality control is obligatory to establish reliable gene-disease relationships. Manual processes are not scalable and are time consuming. NextGen will develop extensible AI-guided genomic data curation pipelines, to complement other AI mediated […]
Genomic sequencing identifies variations in the genetic code. To develop diagnostic and treatment processes, variants need to be linked to diseases, and the “clinical validity” of a suggested gene-disease relationship determined (variant annotation). This evidence-based process classifies relationships based on the level and quality of evidence. Genomic analysis produces variants lists from which gene-disease relationships […]
The cost of whole exome and whole genome sequencing continues to fall, so that the bottleneck in the clinical adoption of genomics-based precision medicine has shifted from data generation to data analysis. Genomic data analysis is a computationally intensive process with multiple processing steps. With the amount of genomic data growing “exponentially” it becomes increasingly […]
Standardisation of data formats alone is insufficient for research portability: standardisation lags data generation (particularly for research using bespoke formats) and is not intrinsically multimodal. A given clinical research question requires a specified set of variables. Cross-border/federated portability requires this multimodal dataset be present at each site and ingestible despite heterogeneity of the underlying data […]
NextGen semantic technology allows the true abstraction of meaning from format. NextGen cataloguing supports discoverability of complex multimodal datasets within data ecosystems supporting specific research questions and allowing integration into federated computational pipelines.
Federated machine learning is now an established technique; however the distributed computation of genome-specific algorithms (e.g. differential gene expression, polygenetic risk scores (PRS), genome-wide association studies (GWAS), etc) is considerably less advanced. In NextGen we develop new federated implementations for genomic analysis where none currently exist and extend new and existing methods (e.g. PRS) to […]
Federated learning is an effective strategy for learning from distributed data without aggregation at a central site. Current approaches in federated learning either create a specialized application for each algorithm or use a distributed environment to share and run code among parties. The first approach is not easily generalizable and limits the possibility of end-users […]