Projects 2021

Discover the numerous projects of our doctoral researchers from 2021.

Imaging & Diagnostics /  Surgery & Intervention 4.0

Evaluation and combination of disease forecasts

Evaluation and combination of disease forecasts

Doctoral researcher: Daniel Wolffram

Institution: KIT

Data science PI: Melanie Schienle

Life science PI: Johannes Bracher

 

During the ongoing COVID-19 pandemic, probabilistic forecasts of infectious disease spread have become a research priority, because they allow for a nuanced assessment of an uncertain future and can guide policymakers to informed decisions. Several major collaborative forecasting projects have been launched to systematically compare, evaluate and aggregate forecasts from different models. To make forecasts comparable, they are stored in a standardized quantile format. This approach raises numerous statistical challenges, which we will address in our project.

 

Statistically sound evaluation is a prerequisite for model improvement. However, many existing methods are not suitable for quantile forecasts. We aim to develop novel visual tools and statistical tests to assess forecast calibration and to make score-based evaluations more interpretable.

 

To improve forecast performance and reliability, combining forecasts into ensembles has shown promise in disease forecasting and is already common practice in fields like weather forecasting. We will study aggregation methods for quantile forecasts, which will involve state-of-the-art quantile regression methods, non-parametric approaches like isotonic distributional regression, and recent machine learning approaches. Particular focus will be given to common characteristics of disease spread like geographical heterogeneities and regime shifts, which could potentially be leveraged by meta-learning approaches.

Data science approaches to evaluate and correct bias

Data science approaches to evaluate and correct bias in prokaryotic single-cell transcriptomics

Doctoral researcher: Julia Münch

Institution: DKFZ

Data science PI: Benedikt Brors

Life science PI: Anne-Kristin Kaster

 

During the last decade, single-cell omics technologies such as single-cell transcriptomics (SCT) have rapidly emerged. As a complement to cultivation-based meta-transcriptomic approaches, SCT allows direct access to information on gene expression from individual cells rather than bulk samples, and thus tremendously improves our understanding of numerous biological processes.

 

While single-cell technologies are mainly applied to and consequently vastly advanced for eukaryotes, adapting these methods to prokaryotic organisms has been proven challenging for several reasons. First, the stiff but yet highly diverse cell wall architecture among bacterial species hampers effective permeabilization and lysis. Furthermore, the low RNA amounts are insufficient to be directly analyzed via current sequencing technologies. Hence, most protocols comprise an essential amplification step before sequencing.

 

However, this step might introduce a bias to the data set veiling the genuine expression profiles by incomplete or uneven amplification, chimera formation, and biases against templates with high GC content. Although various experimental techniques have been developed to deal with these issues, no systematic approach has been taken to evaluate this bias. Nonetheless, studying the microbial world at single-cell level is of fundamental importance, particularly regarding the enormous potential of (yet unknown) microbial species for biotechnological applications, but also in the context of infectious diseases and cancer progression and therapy.

 

Therefore, this project aims at addressing the issues of amplification bias in SCT. To do so, prokaryotic single-cell RNA-seq data will be generated and used to train suitable regression models. By using modern methods from machine learning, technical noise and biological variation will systematically be discriminated to eventually correct for the bias.

Inverse Radiotherapy Treatment Planning

Inverse Radiotherapy Treatment Planning using ML Outcome Prediction Models

Doctoral researcher: Tim Ortkamp

Institution: KIT

Data science PI: Martin Frank

Life science PI: Oliver Jäkel

 

Half of all cancer patients receive radiation therapy which delivers high-energetic, ionizing radiation to target cancerous tissue while sparing healthy tissue. The planning of such a radiotherapeutic intervention is highly personalized due to the application of medical imaging techniques like computed tomography or magnetic resonance imaging followed by simulation and optimization of the radiation dosage.

 

In the classical approach to treatment planning, the underlying trade-off between tumor control probability (TCP) and normal tissue complication probability (NTCP) is usually taken into account only in an indirect manner by using surrogate planning objectives depending on empirical dose prescriptions and tolerances, instead of performing a „biological optimization“ based on these two quantities. However, in the current literature there are technically feasible approaches to integrate TCP and NTCP directly into the plan optimization process. Unfortunately, these approaches rely on simple analytical models with low-dimensional parameter spaces, e.g. the logistic LKB model, and are therefore consistently questioned with respect to accuracy and usability.

 

Therefore, the proposed project aims at superseding these approaches by integrating state-of-theart Machine Learning (ML) models for TCP and NTCP into the radiotherapy treatment plan optimization process, which poses multiple mathematical, computational, and clinical challenges. In summary, the following project objectives are set:

 

  • Investigation and definition of ML-based model stability within the dose optimization environment given untypical input during optimization iterations
  • Development of computationally efficient optimization objective and constraint functions based on ML models, especially with respect to differentiability and respective efficient gradient computations
  • Interpretation of generated treatment plans and definition of clinically acceptable optimization strategies

 

Additional output is planned to be a (partly open-source) software module for further research and pre-clinical testing.

Data-driven Gamification

Data-driven Gamification to Improve Quality in Medical Image Annotation Tasks (GaMeIT)

Doctoral researcher: Simon Warsinsky

Institution: KIT

Data science PI: Ali Sunyaev

Life science PI: Martin Wagner

 

Machine Learning (ML) models are increasingly diffusing in healthcare. ML models can, for example, come into play in cognitive surgical robots by helping the robot to understand the surgery context, comprehend the surgery procedure and eventually generate safe trajectories to assist during the surgery. To train ML models sophisticated enough to assist in surgery, large amounts of annotated image data are required. Due to expertise often being necessary to interpret medical images (e.g., surgery images, MRI images), medical image annotation is often manually conducted by healthcare professionals.

 

Manual annotation is prone to human errors since it can be tedious, monotonous, and exhausting. Accordingly, poor label quality is a common problem. However, for surgical robots to improve surgical procedures, sufficient data quality of annotated images is a decisive factor. If ML models for surgical robots are trained based on poorly labeled image data, this may negatively influence patients’ health since the robots cannot be utilized to their full potential.

 

In this project, we tackle the problem of poor label quality in medical image annotation through gamification (i.e., the use of game design elements in non-game contexts). In this project, we address the problem of poor label quality of surgical image data by augmenting the annotation process with gamification. The overarching objective of the project is to design, implement, and evaluate a data-driven ML-based gamification concept to augment the annotation process, foster annotators’ engagement and, thereby, ensure high quality data labeling. By drawing on ML methods, the gamification concept is able to adapt to individual user preferences and overcome the weaknesses of one-size-fits-all gamification approaches.

   

Models for Personalized Medicine

Leveraging large-scale single-cell datasets

Leveraging large-scale single-cell datasets for personalized cancer cell-of-origin inference

Doctoral researcher: Abdul Moeed

Institution: DKFZ

Data science PI: Oliver Stegle

Life science PI: Pavlo Lutsik

 

The goal is to develop machine learning strategies for deciphering cell-of-origin related cancer heterogeneity. Building on variational autoencoders and related dimensionality reduction methods, we will derive generative representations of population-scale single cell data from thousands of individuals. Leveraging these trained models, we will then interpret disease sample in a reference latent space, thereby enabling patient/cancer specific cell-of-origin identification at unpresented resolution.

 

We will apply these methods to blood cancer sand specifically Acute Myeloid Leukemia (AML) and Chronic Lymphocytic Leukemia (CLL), for which we have access to pertinent data resources. The ultimate goal is the generation of model-driven insights for clinical guidance, enabling early detection, as well as patient-specific strategies for treatment and disease management.

ML to Predict Radical Transfer Mechanisms

Supervised Machine Learning to Predict Radical Transfer Mechanisms across Collagen Genetic Disorders (SMaRT)

Doctoral researcher: Marlen Neubert

Institution: KIT

Data science PI: Pascal Friederich

Life science PI: Frauke Gräter

 

Collagen is the most abundant protein in our body and performs a variety of functions, including strengthening and supporting skin, tendons, and bone tissue. Recent research of the Gräter research group revealed that collagen produces highly reactive radicals when subjected to mechanical stress, which can damage surrounding tissue. At the same time, it is shown that collagen can mitigate radical damage.

 

In light of these findings, this project aims to investigate the relation between collagen-related diseases and defenses against radical damage, as there are more than 1000 known genetic collagen mutations. The influence of these mutations and their consequences on the function of collagen on a molecular level are not well understood. Since microscopes cannot be used to observe interactions on an atomic scale, computer simulations are more suited in helping to understand these dynamic molecular processes. In order to simulate changes of a molecular system over time, an accurate description of interatomic energies and forces is needed. Quantum mechanical energy calculations are complex for large systems such as collagen fibrils since their computational cost scales roughly cubic with the system size.

 

Machine learned potentials are a data-driven energy calculation method which maps the atomic coordinates to the potential energy of the system using a machine learning model. These models are computationally less expensive than quantum mechanical calculations, but can reach similar accuracies, thus offering the ability to speed up large scale molecular dynamic simulations.

Using invertible neural networks to predict molecular interactions

Using invertible neural networks to predict molecular interactions

Doctoral researcher: Marcel Meyer

Institution: UNI HD

Data science PI: Ullrich Köthe

Life science PI: Rebecca Wade

 

Protein-biomolecule-interactions are ubiquitous in life, from DNA-replication to regulating the heartbeat. The stability of biomolecular complexes depend heavily on the involved proteins’ amino acid sequence. Changes in this sequence and their effects on complex formation are of interest in many diseases, but also in protein design. However, predicting how two molecules will interact remains a challenge due to the high degrees of freedom both within and between the interaction partners.

 

In the recent past, deep neural networks have proved useful for structure prediction of biomolecules. We investigate the suitability of invertible neural networks (INNs) for predicting the structures of molecular complexes, as well as the effects of mutations in sequence. INNs are interesting candidates because they model the full probability distribution of molecular interactions. Also, they do not suffer from the curse of high dimensionality and can therefore be scaled to very large problems like protein-protein-interactions.

 

Major tasks at hand are finding the right representation of the molecular structure and building suitable network architectures. Once we can reliably predict molecular interactions we will integrate sequence information to learn how sequence alterations influence the structure of biomolecular complexes.