Workshop

Advanced machine-learning approaches for the analysis of microbiome data Workshop

The analysis of the human microbiome has recently attracted the attention of several research communities, due to its potential diagnostics, prognostics, and therapeutics role for several diseases, including diabetes, liver cirrhosis, some types of cancer (e.g., colorectal cancer) as well as for disorders like the autism spectrum disorder. The adoption of statistical and Machine Learning approaches appears very promising to elucidate existing (or identify novel) relationships between microbiome conditions and diseases or to build descriptive and predictive models that can be adopted to improve existing therapeutic procedures. The workshop “Advanced machine learning approaches for the analysis of microbiome data” will focus on advanced machine learning approaches and their (potential or actual) application to microbiome data, including semi-supervised learning methods, multi-view learning methods, and transfer learning approaches.

Programme - 21 June 2023

[09:30 - 09:50] Welcome and workshop introduction (Michelangelo Ceci, Gianvito Pio, Domenica D’Elia)

[09:50 - 10:30] Silvio Tosatto (KEYNOTE LECTURE)
DOME: recommendations for supervised machine learning validation in biology

[10:30 - 11:00] Coffee break

[11:00 - 11:30] Andrea Simeon (INVITED TALK)
Multi-class boosting with adversarial multi-arm bandits on incomplete microbiome views

[11:30 - 11:50] Adriano Zaghi
Machine learning models for the detection of antimicrobial resistance using synthetic data

[11:50 - 12:10] Meritxell Pujolassos
coda4microbiome: compositional data analysis for microbiome crosssectional and longitu

[12:10 - 12:30] Donato Romano
Explainable Artificial Intelligence (XAI) for Microbiome Data Analysis in Autistic Spectrum Disorder

Invited Speakers

Silvio Tosatto, Full Professor in Bioinformatics, Chair of Biochemistry, has been PI of the BioComputing UP lab at the Department of Biomedical Sciences (University of Padova) since 2002. Previously, he earned his MSc and PhD from the University of Mannheim (Germany). His work in protein bioinformatics focuses on the structural and mechanistic aspects of complex systems (e.g. cancer) as well as the provision of services and databases for the scientific community. Prof. Tosatto is heavily involved in ELIXIR, the European infrastructure for biological data, where he is Deputy Head of Node for the Italian node, ExCo (co-lead) of the Data Platform and co-lead of the Machine Learning (ML) focus group. His lab is part of the Gene Ontology, InterPro, Pfam and PDBe-KB consortia, where it contributes data on intrinsically disordered and repetitive proteins from the MobiDB, DisProt, PED and RepeatsDB databases hosted in Padova. His on-going work in evaluating ML methods has prompted the DOME recommendations for ML publications and the CAID experiment for assessing predictors of intrinsically disordered proteins.

Talk title: DOME: recommendations for supervised machine learning validation in Biology.

Abstract: As large amounts of biological data are being generated and made accessible to researchers, machine learning (ML) has come into the spotlight as a very useful approach for understanding complex biological problems. While ML methods should ideally be validated experimentally, this happens only in a fraction of the publications and most ML methods still rely on computational validation. In addition, the complexity of many ML makes understanding their true performance complicated. DOME is a set of community-wide recommendations for reporting supervised machine learning–based analyses applied to biological studies. The structured DOME description is based on four components, making up the acronym: Data, Optimisation, Model and Evaluation. Each corresponds to a section which should be reported as meta-data for an ML method. DOME has developed by a group of over 30 experts from various European countries within the ELIXIR ML Focus Group. The current focus is on developing the DOME registry for depositing DOME-compliant meta-data. Broad adoption of the DOME recommendations will help improve machine learning assessment and reproducibility.

Andrea Simeon, Mathematician and Data Science researcher at BioSense Institute, Novi Sad, Serbia. Focused on applying Machine Learning and Deep Learning techniques in microbiome studies and exploring different preprocessing pipelines for analysing amplicon and shotgun sequence data. Thanks to the research work carried out during her ML4Microbiome COST Action STSM, Andrea won from the Faculty of Sciences, University of Novi Sad, Serbia, the Aleksandar “Saša” Popović Award For Best Student Paper – 2022.

Talk title: Multi-class boosting with adversarial multi-arm bandits on incomplete microbiome views

Abstract: Microbiome has been massively associated with different diseases and disorders. To identify individual microorganisms and their abundances across samples, different sampling, sequencing and preprocessing techniques could be considered. This leads to different input feature sets (views) to learn predictive models through machine learning (ML) approaches. ML models aid in finding the associations between microbiome and disease. Standard (single view) ML models are not capable of dealing with multiple views at once, and thus they were upgraded to fit multi-view datasets (e.g. Adaboost and Multi-view Adaboost). Moreover, microbiome data comes from various sources, and incompleteness is often inevitable. Existing classifiers, even multi-view, cannot be directly used because they cannot work with incomplete views and in multi-class settings. To our knowledge, there is no multi-view boosting algorithm for multi-class classification with incomplete views. The proposed algorithm is the extension of an existing multi-view boosting algorithm based on multi-arm bandits, now able to work in multi-class setting and with incomplete views (views with missing sample representation). At each iteration, it proclaims one view as the winning using adversarial multi-arm bandits and uses its predictive information to update the final model weights and prediction in a boosting process. Three data sets were created from several microbiome studies and used to examine the performance of the proposed algorithm. One of the experiments showed a 7% increase in F1 score compared to a single view classifier, while the other showed 54%. The application domain is not restricted to microbiome data. Further work will involve examinations in other domains.

Submission

Submission to the workshop is enabled via the BITS2023 conference submission form. Please, choose Session "Machine Learning applications for microbiome data analysis (Workshop)".
The deadline and notification dates are the same as the key dates of the abstract submission to the main conference.