2023 AI Hackathon Pitch Event

The Faculty of Medicine and IceLab organized an AI Hackathon Pitch event for the second time.

February 2, 2023

Seven researchers pitched ideas during the IceLab – Medical Faculty Hackathon Pitch Event. They are seeking project partners to complete their teams and submit a project together for the Hackathon in Lövångergården. Find out more about their pitches and reach out to them if they interest you!

AI Hackathon Pitch Event Schedule

Martin Rosvall opens the AI Hackathon Pitch Event

Committed and curious researchers at various departments and faculties at Umeå University, as well as external guests from Uminova and Försäkringskassan, gathered in the MIT building’s Ljusgård to listen to each other’s pitches and make new contacts. The purpose of the pitch event is to allow researchers who were previously unaware of each other’s skills to meet, discuss, create teams and then explore the solutions further in a Hackathon to be held at Lövångergården later in the month. Life-science researchers with exciting data in need of analysis methods and researchers in computer science and statistics in need of new data rarely meet, and these pitch events are a unique opportunity for these encounters to occur. Martin Rosvall, director at IceLab, welcomed all the participants;

– Together we intend to organize a Hackathon that is at least as successful as last time, when all three participating teams wrote successful research applications.

Senior research engineer Elin Thysell, Department of Medical Biosciences and professor Oliver Billker from MIMS presented pitches that contained exciting and innovative data. Postdoctoral researcher Barbara Forró from Maria Fällman’s research group and MIMS, Max Hellström, doctoral student from the Department of Radiation Sciences, Laura Carroll, computational biologist/bioinformatician at UCMR, Johan Henriksson, researcher at MIMS and Priyantha Wijayatunga, university lecturer at the School of Economics, presented method pitches.

Martin concluded the meeting by inviting all participants to register a team for the Hackathon in Lövångergården, which takes place between February 22-24.

– Participation in a Hackathon is a unique opportunity to develop, learn and create new opportunities. All questions are welcome! Researchers not present at the pitch event are also invited to submit a team application to go to the Hackathon at Lövångergården.

Interested individuals can find the participants’ abstracts below. The deadline to apply to join the Hackathon is February 9^th.

Apply to the Hackathon at Lövångergården

Barbara Forró, Postdoctoral Fellow at the Department of Molecular Biology

Can one use a single-cell analyses method for bulk RNA transcriptomes of Salmonella? Why not try it! (Barbara Forró, Sarah Narrowe Danielsson)

Abstract: One of my aims is to find people who are experts with autoencoders and another goal is to find collaborators who want me to explore their zero-inflated datasets.The project aim is to identify the molecular mechanism by which Salmonella can establish and maintain infection. Our model organism is Salmonella enterica serovar Typhimurium (STm) a facultative intracellular pathogen that is used to study gastroenteritis and systemic typhoid fever. To acquire more knowledge about the critical mechanisms of the invasion and maintenance of the STm infection we applied in vivo transcriptional profiling of bacteria isolated from infected mice, both during early colonization and at later more chronic stages. Analysing in vivo transcriptomes is challenging because of the low number of bacteria, that can be gained from a mice experiment, results in incomplete zero-inflated datasets. The received datasets resembles single-cell outputs, and therefore I tried to use scvi-tools, an autoencoder developed for single-cell data analyses, to achieve information from the data regarding of the bacterial mechanisms. To be able to run the algorithm I needed a lot of input, therefore I included all our available in vivo and in vitro data as well. This allowed us to gain biologically relevant information in the transcriptome. Therefore, I need to validate that scvi-tools is reliable for any kind of dataset that is zero-inflated, if anyone has this kind of dataset and interest I would happily run on my setup. I would also like to seize the opportunity to find an autoencoder expert to discuss the adaptation for further use of scvi-tools or any other suggested method on zero-inflated bulk RNA datasets.

Max Hellström, Doctoral student at the Department of Radiation Sciences

Abstract: Medical images obtained using e.g. magnetic resonance imaging (MRI) or computed tomography are often degraded by noise. When these images subsequently are used to estimate quantitative tissue properties, such as blood flow in a tumor, the noise will result in uncertainties in the estimated values.We will present a new method for removing much of these uncertainties and also estimating their size. The method is based on a convolutional neural networks (CNN) which structure is used as a prior during the estimation of the properties. This is very unconventional approach to using CNNs since it does not require any training data. Our initial results where the method has been applied to MRI data shows that it can clean up most of the noise in estimates of relaxation time and diffusion rate. Compared to other denoising methods it typically performed better while simultaneously being able to produce estimates of the remaining uncertainty. We are now looking for collaborators that may have usage of the method to improve the quality their medical images. So far, we have investigated the method in terms of accuracy and precision. Now we are particularly interested in collaborations where we can evaluate the impact of applying the method.

Oliver Billker, Professor at the Department of Molecular Biology and Director of MIMS (Molecular Infection Medicine Sweden)

Abstract: Malaria parasites are divergent eukaryotes whose poorly annotated genomes need to be better understood if we are to develop new drugs and vaccines. To this end we have developed genome-scale screening methods that use pools of barcoded mutants to systematically assign gene functions. Functional data are now available from ten screens, but unfortunately our methods only cover two thirds of the parasite genome. Two questions arise that we want to approach using machine learning: 1. Why can we only construct successful gene targeting vectors for two thirds of genes? The answer is independent of gene function and must have something to do with the chromosomal or sequence context that makes some Plasmodium DNA toxic in E. coli. Can we identify sequence or design features that are predictive of success? 2. Can we predict gene functions for the genes that are currently not tractable by integrating functional data from the available mutants with the large quantity of transcriptomic, proteomic, phosphoproteomic and protein interaction data we (and others) have generated over the years? The aim is to generate testable hypotheses, examine these experimentally and thereby assess the confidence with which we can predict gene functions from the available data. We also want to know which features in the genomic datasets have the greatest predictive value.

Barbara Forró pitches

Max Hellström pitches

Oliver Billker pitches

Elin Thysell, Senior Research Engineer at the Department of Medical Biosciences

Abstract: There is a need for diagnostic tools that can safely determine treatment options for metastatic prostate cancer. The lack of biomarkers for individual treatment choices is a problem, as patients respond differently to treatment options. If treatment is chosen based on a patient’s own tumor characteristics, it can increase the survival time of metastatic prostate cancer. Our solution is an automated image analysis application that evaluates two biomarkers in digitalized images of tissue samples from prostate biopsies. The innovation is the use of new markers and AI-based image analysis. This ratio of markers has been shown to provide value in treatment choices and optimize additional treatment. By choosing treatment based on tumor biology, the chance of treatment response is increased. We are currently seeking collaborations to develop a prototype for this solution, which will be used in retrospective and prospective studies to enable clinical implementation through care programs.

Laura Carroll, Associate Professor at the Department of Clinical Microbiology

Abstract: Biosynthetic gene clusters (BGCs) are enticing targets for (meta)genomic mining efforts, as they may be responsible for the production of novel, specialized metabolites with potential uses in medicine and industry (e.g., novel antimicrobials, anticancer agents) or roles in human health (e.g., novel toxins, carcinogens, pathogen virulence factors). Here, I describe GECCO (GEne Cluster prediction with COnditional random fields; https://gecco.embl.de), a high-precision, scalable method for identifying novel BGCs in (meta)genomic data using conditional random fields (CRFs). Based on an extensive evaluation of de novo BGC prediction, GECCO is both more accurate and faster than other state-of-the-art machine learning approaches (domain-level AUPR=0.89). When applied to (i) a set of >300k genomes and metagenomes derived from human gut-associated microbes, and (ii) all publicly available, high-quality prokaryotic isolate genomes (>1 million genomes), GECCO identifies over 600k and nearly 3 million BGCs, respectively, including BGCs enriched in microbiome-mediated disease states (e.g., colorectal cancer). Overall, GECCO provides unprecedented insight into microbial biosynthetic potential; however, future experiments are needed to validate the most promising candidate BGCs (e.g., putative novel antimicrobials, toxins, carcinogens) in vitro and/or in vivo.

Elin Thysell pitches

Laura Carroll pitches

Johan Henriksson, Research Fellow at the Department of Molecular Biology

Abstract: How can we make sense of high-dimensional datasets? In this pitch I will invite others to join our efforts in using explainable variational autoencoders (VAEs). We are developing them to analyze single-cell data, which we want to reduce to easily interpretable factors. This can be achieved by shaping the neural network appropriately, and giving suitable Bayesian priors.

Paolo Soda, Visiting Professor at the Department of Radiation Sciences

Abstract: We are witnessing a widespread adoption of artificial intelligence in healthcare. However, most of the advancements in deep learning (DL) in this area consider only unimodal data, neglecting other modalities. Their multimodal interpretation is also necessary for supporting diagnosis, prognosis and treatment decisions.We are therefore looking for new, interesting data challenges using multimodal data sources, which need to be explained to trust the decision taken.

Priyantha Wijayatunga, Associate Professor at Umeå School of Business, Economics and Statistics

Abstract: Time series data are very common in many application domains such as medicine. Often, we are interested in predictions, e.g., health condition in the next few hours of an ICU patient. Here we present such a prediction model that can be applied for, e.g., prediction of a medically important condition based on past medical condition data on many clinical attributes. Our proposal is a causal model that can take into account many time series in order to predict the future of a desired time series. In fact, it can be generalized to predict many series. It can be argued the proposed model is useful for the clinical Big Data as it can be based on many different time series and also it has the possibility include non-time series data. Since it is a probabilistic model, it can be used even without any subject domain knowledge, i.e., it is data driven. And it can also include domain expert knowledge through Bayesian parameterization. In this sense, it is attractive to the clinical Big Data analysis.

Johan Henriksson pitches

Priyantha Wijayatunga pitches

Jenny Persson, Professor at the Department of Molecular Biology

Abstract: There is an urgent need for developing new biomarker tools to accurately predict treatment response of breast cancer, especially the deadly triple-negative breast cancer. We aimed to develop gene-mutation-based machine learning algorithms as biomarker classifier to predict treatment response of first-line chemotherapy with high precision. Methods: Random Forest machine learning (ML) was applied to screen the algorithms of various combination of gene mutation profiles of primary tumors at diagnosis using the TCGA Cohort (n = 399) as a training set and validated in the MSK Cohort (n =807). Subtypes of breast cancer including triple-negative and luminal A (ER+, PR+ and HER2-) were assessed. The performance of the candidate algorithms as classifiers and predictive biomarker was further assessed using logistic regression, progression-free survival (PFS) up to 220 months follow-up and univariate/multivariate Cox proportional hazard analyses. Results: A novel algorithm termed the 12-Gene Algorithm based on mutation profiles of KRAS, PIK3CA, MAP3K1, MAP2K4, PTEN, TP53, CDH1, GATA3, KMT2C, ARID1A, RunX1, and ESR1, was identified. The performance of this algorithm to distinguish non-progressed (responder) vs. progressed (non-responder) to treatment in the TCGA Cohort as determined using AUC was 0.96 (95% CI 0.94-0.98). It predicted progression-free survival (PFS) with hazard ratio (HR) of 21.6 (95% CI 11.3-41.5) (p<0.001) in all patients. This algorithm predicted PFS in the triple-negative subgroup with HR of 19.3 (n=42, p<0.001). The 12-Gene Algorithm was validated in MSK cohort. Similar to what was observed in the TCGA cohort, this algorithm had performance of AUC of 0.97 (95% CI 0.96-0.98) to distinguish responder vs. non-responder patients to treatment in MSK cohort, and had a HR of 18.6 to predict PFS in triple-negative subgroup (n=75, p<0.001) in MSK cohort. Conclusions: The novel 12-Gene algorithm based on multitude gene-mutation profiles identified through ML has great potential to predict treatment response to therapies in subgroups of breast cancer patients, which may assist personalized therapies and reduce mortality.

Jian-Feng Mao, Associate Professor at the Department of Plant Physiology, UPSC

Abstract: Generally speaking, we are working on plant genomics, always looking for the data-driven questions and implementing and using data intensive tools. We are now trying to predict some specific sequence unit from whole genome DNA sequence, and to identify specific functional unit along the linear chromosomal DNA by integrating multiple layers of functional information, also to learn whether features of DNA sequences contribute to differential gene expression and which is the main player involved, and how. Yes, we are trying hard to use various AI modeling, however, it is challenging and in many times we are confused and lost.