Projects 2019 - Helmholtz Information & Data Science Academy

Discover the numerous projects of our doctoral researchers from 2019.

Imaging & Diagnostics

Generating Medical Imaging Reports

Generating Medical Imaging Reports from 3D Radiological CT Scans using Image Captioning Techniques

Doctoral researcher: Constantin Seibold

Institution: KIT

Data science PI: Rainer Stiefelhagen

Life science PI: Heinz-Peter Schlemmer

While Deep Learning approaches, in particular Convolutional Neural Networks (CNNs), are now also being widely adopted in medical image analysis, only very few publications exist that deal with the integration of visual and textual features or the generation of explanations for findings in images. So far, none of the state-of-the-art image-captioning approaches have been applied to 3D medical images thereby utilizing the available medical reports as the sole training input.

The goal of this project is investigate state-of-the-art machine learning methods in order to automatically generate clinically meaningful image descriptions (medical reports) for radiological CT images. The resulting model shall be able to describe radiological findings, eg. lesions in an image, their location and descriptive relevant attributes, such as they are usually used by doctors in medical reports.

Camera-Invariant Spectral Image Analysis

Camera-Invariant Spectral Image Analysis in Interventional Healthcare

Doctoral researcher: Jan Sellner

Institution: DKFZ

Data science PI: Lena Maier-Hein

Life science PI: Felix Nickel

Death within 30 days after surgery has recently been found to be the third-leading cause of death worldwide. One of the major challenges faced by the surgeons is the visual discrimination of tissues, for example to distinguish pathologies or critical structures from healthy remaining tissue.

In this context, multispectral imaging has been proposed to overcome the arbitrary restriction of conventional camera systems (and the human eye) which capture only red, green and blue colors. Instead, the spectral dimension consists of multiple specific bands of light (e.g. 16 or 100). As different tissue types have different absorbance patterns, multispectral cameras provide additional information about the tissue and corresponding functional properties, such as oxygenation.

In our research, we leverage this additional information for machine learning-based classification of tissue in real-world surgery settings. Compared to non-medical machine learning algorithms, however, the data available for training our models is comparably small (e.g. only hundreds of images instead of millions). Furthermore, due to the novelty of the technique, the available hardware undergoes rapid developments as well as application-specific adaptations (e.g. open surgery vs. minimally invasive surgery). In consequence, the available captured data sets feature large variations with respect to the number of bands, illumination conditions and other factors.

The primary goal of this project is therefore to develop novel concepts for tissue classification that avoid losing valuable knowledge encoded in the data of one camera when applying a different camera. More specifically, we aim to develop a camera invariant tissue classification framework that leverages all the spectral imaging data available, irrespective of the number of bands and the specific camera configuration.

A Framework to Facilitate Biological Image Processing

An Automated Framework to Facilitate Biological Image Processing with Deep Learning Methods

Doctoral researcher: Hamideh Hajiabadi

Institution: KIT

Data science PI: Anne Koziolek

Life science PI: Lennart Hilbert

Fluorescence microscopy has several inherent limitations, which are dictated by basic optical and physical laws as well as compromises arising from the fact that biological materials are being imaged. Recent publications demonstrated how machine learning approaches can substitute image information in situations where it is otherwise difficult or even fundamentally impossible to obtain directly (e.g., a neural network trained on separate recordings of samples prepared with different fluorescence markers that predicts “all” the markers for new recorded images). Another broader challenge that affects not just the microscopy community concerns the broad exploitation of cutting-edge algorithms.

However, the understanding which structural, organizational, technical and design decisions result in effective exploitation of expert-developed microscopy software by non-expert users is only understood at an anecdotal level. Also from the software engineering perspective, situations where non-experts build new machine learning applications from existing algorithms is not yet well understood and supported.

The goals of the project are to provide a machine learning-based image analysis tool that monitors more different nuclear components than possible with current fluorescence microscopes and to provide a software engineering method to support non-expert developers when transferring neural network-based solutions from one image analysis setting to another.

Scalable 3D+t Tracking by Graph-based and Deep Learning Methods

Doctoral researcher: Katharina Löffler

Institution: KIT

Data science PI: Ralf Mikut

Life science PI: Uwe Strähle

Many biomedical imaging devices are able to collect time series of high resolution 3D data (3D+t). A typical task in these datasets is the tracking of objects over time, e.g. cells or sub-cellular structures. An example for such imaging tasks is the analysis of light sheet microscopy data by imaging fluorescently labeled cells.

Our initial focus will be on the embryonic development of zebrafish (Danio rerio) as a preferred model animal for vertebrates. Our investigations aim at quantitative descriptions of cell death, divisions, movements, and interactions.

Usual methods for basic tasks like identifying cell nuclei in the images have accuracies around 95 %. This sounds good by the standards of conventional image processing. However, this implies that in every image the system will lose sight of so many nuclei that reconstructing the overall cellular lineage (nuclei movements and cell divisions) becomes an intractable jigsaw puzzle even for the simplest task of cell nucleus tracking.

The aim of the project is a significant improvement of 3D+t tracking algorithms for the case of dense objects and limited imaging quality. Our initial goal in light-sheet microscopy is to follow the movement of more than 10 000 cells and then scale the solution to hundreds of experiments.

Design Principles Underlying Fault Tolerance

Design Principles Underlying Fault Tolerance in Control Systems of Genetically Varying Model Organisms

Doctoral researcher: Vojtech Kumpost

Institution: KIT

Data science PI: Ralf Mikut

Compared to man-made technical control systems, biological control systems exhibit a remarkable fault tolerance over wide ranges of environmental cues as well as in face of variability at the level of the system-encoding genes. One of the striking features of the biological control systems is a high level of regulatory redundancy, however, it is still unclear how such features might specifically contribute to the wide-ranging fault tolerance of biological control systems.

This project aims to investigate whether this fault tolerance of biological control systems is rooted in design principles that differ from man-made control systems. In particular, we investigate how far and what kind of redundancy contributes to the wide range of genetic and environmental variability that biological control systems can cope with.

Image-Guided Adaptive Radiation Therapy (IGART)

Image-Guided Adaptive Radiation Therapy (IGART) Based on Massive Parallelism and Real-Time Scheduling

Doctoral researcher: Vahdaneh Kiani

Institution: UNI HD

Data science PI: Holger Fröning

Life science PI: Oliver Jäkel

The image shows the challenge of the project in my view: the first part shows a volume rendering of a lung with a tumour inside, the arrows and the red area indicate the motion of the tumour with the respiration. In RT (radiation therapy), we need to track this motion to be able to adapt the irradiation.

The second image shows the three components of image registration process (a deformable transformation in the grid, the similarity measure in the histogram, and the optimizer strategy this 2D lines and the sampling of a path within this objective function). Even if everything is running on a one (blue) GPU it's not real-time. The aim of the project is to find ways to redistribute the calculation load to multiple (green) GPUs.

Anomaly Detection Using Unsupervised Learning

Anomaly Detection Using Unsupervised Learning for Medical Images

Doctoral researcher: David Zimmerer

Institution: DKFZ

Data science PI: Klaus Maier-Hein

Life science PI: Heinz-Peter Schlemmer

An assumption-free automatic check of medical images for potentially overseen anomalies would be a valuable assistance for a radiologist. Deep learning and especially generative models such as Variational Auto-Encoders (VAEs) have shown great potential in the unsupervised learning of data distributions.

By decoupling abnormality detection from reference annotations, these approaches are completely independent of human input and can therefore be applied to any medical condition or image modality. In principle, this allows for an abnormality check and even the localization of parts in the image that are most suspicious.

Surgery & Intervention 4.0

Robust Tissue Classification with Multispectral Imaging

Doctoral researcher: Silvia Seidlitz

Institution: DKFZ

Data science PI: Lena Maier-Hein

Life science PI: Hannes Kenngott

Machine learning-based decision support can potentially improve the quality of healthcare by providing physicians with the right information at the right time. An important prerequisite for context-awareness and autonomous robotics in surgery is the fully-automated real-time analysis of medical images. Relevant tasks include semantic segmentation for surgical scene understanding as well as the automated image-based classification of medical conditions, such as sepsis.

Relying on conventional RGB imaging, however, is problematic, as important morphological and functional information is invisible on such widely acquired surgical images. Spectral imaging which unlike conventional RGB imaging and the human eye, captures multiple specific bands of light instead of being restricted to the red, green, and blue bands, has the potential to improve tissue classification. Due to the high dimensionality of spectral data, interpretation by humans is infeasible. While machine learning has the potential of addressing this bottleneck, several open research questions remain.

Spectral data representation: It has so far not been determined what constitutes an adequate representation of spectral data for machine learning-based fully automated tissue classification, especially with respect to the spatial granularity of the data (pixels vs. superpixels vs. patches vs. full images) and the processing of the spectral data. A primary goal of this thesis, is therefore to close this gap.

Confounders in spectral image classification: While machine learning-based classification has generated several clinical success stories during the past years, recent research also revealed a large number of algorithms for which performance is overestimated due to biases. To date, however, this important issue has been completely ignored in the context of medical spectral imaging. The second major goal of this thesis, is therefore, to automatically identify confounders in medical spectral data and to achieve a bias-free representation of spectral data for machine learning-based tissue classification.

Based on the methodological contributions related to the handling of spectral data in neural networks and the bias awareness, the project will determine, whether automated sepsis diagnosis is feasible with medical spectral data and whether there is a benefit of using spectral data compared to other modalities (e.g. RGB imaging) for automated organ segmentation.

Uncertainty Quantification in Radiation Therapy

Doctoral researcher: Pia Stammer

Institution: KIT

Data science PI: Martin Frank

Life science PI: Oliver Jäkel

Radiation therapy is one of the cornerstones in modern cancer treatment being applied in 50 % of all patients. It is a highly personalized intervention relying on a computational patient model (provided by patient-specific computed tomography and/or magnetic resonance imaging) and, based on this image data, a quantitative, spatial simulation of the radiation dose delivered into the patient’s body.

This digitalization enables a computer-based optimization of every individual radiation treatment before the onset of therapy. In clinical practice, however, treatment optimization only considers one unique “best guess” treatment scenario, neglecting a number of sources of uncertainty along the workflow.

Sources of uncertainty in radiation therapy are myriad: Physical limitations to patient immobilization cause misalignment relative to the treatment beam, changes in the patient’s anatomy both intrafractional (i.e. during the treatment session, for example due to breathing) and interfractional (in between treatment sessions, for example due to weight loss) compromise the validity of the original CT. Measurement errors during CT acquisition cause uncertainty about radiological depths.

We want to build an efficient computational pipeline for uncertainty management in ion beam therapy based on Monte Carlo dose calculation algorithms. Therefore, we want to reduce the arithmetic load of the involved sampling processes by orders of magnitude leveraging state-of-the-art methodology from the mathematical uncertainty quantification community.

Specifically, we pursue the following two objectives:

We want to enable efficient and highly accurate dosimetric uncertainty quantification considering range, setup, and motion uncertainties during forward dose calculation.
We want to use information about treatment uncertainties to systematically minimize their potential negative impact onto treatment outcome.

Lifelong Machine Learning in Surgical Data Science

Doctoral researcher: Patrick Godau

Institution: DKFZ

Data science PI: Lena Maier-Hein

Current Machine Learning algorithms, especially Deep Neural Networks (DNNs), have shown to be successful tools in areas particularly relevant for Life Science, for example image analysis. However they feature some drawbacks preventing the transition from research to real applications.

First and foremost these algorithms rely heavily on huge amounts of data to train on, which is often not given in the medical context because of data protection and privacy policy. Missing this data, the full potential of the used DNN is not accomplished and results of the same DNN architecture appear poor in contrast to non-medical databases.

Second and closely connected to the lack of data is the missing ability to generalize learned concepts. Algorithms trained on outputs produced by a specific medical device in a fixed location often suffer from significant performance loss when using a device from another vendor or switching the hospital/surgeon/procedure. Well-known techniques to overcome missing training data come from Transfer Learning: One implementation of it involves a pre-trained DNN and focuses mainly on slight adjustments by removing final layers and retraining them on the specific medical data. Although this method provides noticeable improvements in some cases there is often no clear methodology other than trial and error for choosing the optimal pre-training data, the correct DNN architecture or suitable re-training hyper-parameters.

To better address these shortcomings, we propose to use ideas from Meta Learning and implement a lifelong learning algorithm in an expandable framework of available data sets. The objective of the project is a meta-algorithm that for a given medical task chooses the optimal learning conditions to generate a model solving the problem (we will focus on minimally invasive surgery as primary use case). This includes the choice of relevant training data, compatible model architecture and desirable hyper-parameters. Because the meta-algorithm itself should be adaptive and learn to better supervise the model production during run time we explicitly address the problem of scalability.

We aim at identifying the relevant parameters to predict such learning conditions, which may include: similarity of data, dependencies between tasks or functionality of network sub-components. Moreover the goal is to overcome the limitation of solving a single task in an isolated fashion, but to reuse existing knowledge/models to improve on new problems.

Quantitative Photoacoustic Imaging with a Learning-to-simulate Approach

Doctoral researcher: Melanie Schellenberg

Institution: DKFZ

Data science PI: Lena Maier-Hein

Life science PI: Hannes Kenngott

Photoacoustic imaging (PAI) is an emerging modality that has the potential to provide tomographic images of blood oxygenation - an important biomarker for the diagnosis and staging of a variety of diseases. The generation of photoacoustic images follows a well-known forward process: Following the illumination of the tissue by multi-wavelengths laser pulses, photons are absorbed by various molecules according to their characteristic absorption spectra. The absorption follows a thermo-elastic expansion of the tissue which leads to a local pressure rise that generates an acoustic shock wave. The propagated acoustic wave can be measured as the PAI signal at the surface.

A core unsolved problem in PAI is the accurate quantification of the optical properties based on the acquired measurements. In this project, we tackle this ill-posed inverse problem with neural networks. Due to the lack of annotated reference data, the core methodological challenge is the generation of adequate training data. In this context, we go beyond the already established concept of learning from simulations with a novel approach to training data generation that we refer to as learning to simulate. Specifically, we investigate a machine learning-based approach to continuously improve the quality of in silico PAI data based on acquired real-world data.

Figure: The principle of photoacoustic imaging (PAI). (a) Based on the photoacoustic (PA) effect, the pressure time series can be measured as (b) raw PA signals and reconstructed into a (c) PA image or further processed, e.g. into (d) semantic PA image annotations.

Models for Personalized Medicine

Distributed Ledger Technology for Life Science

Doctoral researcher: Mikael Beyene

Institution: KIT

Data science PI: Ali Sunyaev, Benedikt Brors

Modern life sciences with their highly sensitive omics data sets face several challenges with regard to data storage and sharing. On the one hand data must be protected in order to preserve the privacy of those individuals who contributed their data to research. On the other hand, the true value of omics data is only to be realized if shared with as many researchers as possible. In an ideal world, data subjects (i.e., patients) should be able to control access to their data directly.

However, granting and revoking access to data is a slow and tedious process within the current life sciences research paradigm, where most data is either stored on central controlled-access data repositories or kept locally within the respective research groups. Adding to this, centralized data repositories create single points of failure in terms of data availability and integrity. Distributed ledger technology (DLT; e.g., blockchain) enables immutable transactions between untrustworthy parties, which are kept in a consistent state through automated, algorithm-based consensus building mechanisms, thus eliminating the need for third-party trust enforcement.

The objective of this doctoral thesis is to develop a secure and privacy-preserving, DLT-based data sharing infrastructure for life sciences. Such a data sharing infrastructure with DLT-based access control will a) give patients flexible, and individual control over the flow of their personal information and b) ease comprehensive data sharing among researchers.

Augmenting Physician Workflow

Augmenting Physician Workflow to Personalize Care Decisions by Predicting Next Steps and Informational Needs in (Precision) Oncology

Doctoral researcher: Paula Breitling

Institution: KIT

Data science PI: Michael Beigl

Life science PI: Frank Ückert

With the possibility of whole genomic sequencing for oncologic patients, many processes in their treatment have to be adapted. Physicians in oncology have significantly more data to consider in order to tailor diagnostic or therapy approaches to the need of a specific patient.

The National Center for Tumor Diseases (NCT) in Heidelberg established workflows for physicians to utilize genomic and clinical data to assess second or third line therapy options for patients. Within the workflows, a lot of external knowledge from databases is needed. Hence, for each database (or database portal), many queries have to be formulated and queried in order to obtain needed information. The steps involved to research information for a single patient are identified for the majority of cases. However, the steps can vary in order or only be necessary depending on the result of previous steps. Hence, a precise detection and discrimination is necessary to infer plausible next steps.

The overall goal is to support a physician in researching options for diagnosis or therapy for patients based on genomic and clinical information by preemptively supplying information needed at the particular workflow step. To achieve that, we aim to precisely identify workflow steps based on usage signals (screen video, click events, time events, …) and possible triggers for further informational needs. A model will be trained to identify steps currently worked on and predict possible next steps (ideally multiple alternatives with likelihoods). These steps of the model will be evaluated for accuracy and (expected) utility in the process at hand.

Integrative Analysis of Patient Multi-omics Data

Doctoral researcher: Rene Snajder

Institution: DKFZ

Data science PI: Oliver Stegle

Life science PI: Matthias Schlesner

Multi-omics, the generation of omics-profiles with multiple assays on the same set of biological samples, is a fundamental experimental design pattern in biomedical research and basic biology. In biomedical applications, objectives comprise biomarker discovery and disease stratification; in developmental and basic biology, understanding cell states, differentiation and cell type evolution, now increasingly at single cell resolution. However, there is a lack of computational methods for the analysis of such data. We will develop methods and software to address this need.

In this project, we seek to deliver new computational methods and applications of supervised and unsupervised multi-omics analytical methods. Our approach builds on the multi omics factor analysis (MOFA), an unsupervised factor analysis method based on a sound and versatile mathematical foundation. MOFA is already a widely-used model in the field and is being applied in cancer genomics, single-cell biology and for other data integration tasks.

The goal of this project is to extend and build on MOFA to deliver new functionality and to enable novel applications. A particular interest is the integration of imaging data modalities (e.g. FMRI, cellular imaging) with other omics data types, thereby exploiting large volumes of datasets available in Heidelberg.