Internship opportunity for students in the bioinformatics course
Location
Oncodesign HQ – Dijon
Referent
Dr. Raieli Salvatore
Contract Duration
3 to 6 months

Oncodesign is a French pharmaceutical company located in Dijon with the goal of developing new treatments against cancer in other therapeutic areas.

The business unit of Artificial intelligence (BU IA) of Oncodesign, uses different bioinformatics, machine learning, and artificial intelligence algorithms to identify new therapeutic targets. BU IA, in this regard, has established a collaboration with several hospitals to sequence multi-omics data of patients. In addition, BU AI collaborates with the artificial intelligence center of the University of Burgundy (CIAD), different research groups in France, and other pharmaceutical companies. In addition, the company has its own infrastructure: a large GPU cluster established in collaboration with NVIDIA.

The AI BU this year has decided to establish three different internships focused on bioinformatics and machine learning. We are looking for motivated students interested in internship propositions focused on the search for new therapeutic targets and the use of machine learning and artificial intelligence. This year's internships are focused on the application of large language models on DNA sequences, to natural language analysis (NLP on a large database of scientific articles, clinical trials, and patents), drug candidates, and protein interactions. The company offers a dynamic environment, expense reimbursement, and a real possibility of employment for the best students.

Topic: AI-Driven Drug Discovery in Oncology: Harnessing Large Language Models for Molecular Insights.

Missions & activities of the internship:
Under co-supervision of two Senior Data Scientists holding PhD titles and interdisciplinary background in artificial intelligence, immunology, mathematics, genetics, genomics and bioinformatics, your duties is to deliver:

  • Evaluation of State-of-the-Art LLMs in Molecular Biology: Conduct an extensive review of the latest developments in LLMs, assessing their performance and identifying the most promising LLMs that can be leveraged for new therapeutic targets discovery.

  • Model Implementation: Develop the technical proficiency to apply LLMs. This involves understanding the practical aspects of model implementation and finetuning.

  • Data preprocessing and analysis: extract meaningful insights from public and/or proprietary datasets.

  • Git code repositories with well-documented scripts in Python and notebooks with any conducted analysis.

  • A report summarizing your findings and contributions.

Topic: Computational identification of off-target proteins for drug candidates

Missions & activities of the internship:

Under co-supervision by a Senior Data Scientist and a Medicinal Chemist holding PhD titles and interdisciplinary background in artificial intelligence, medicinal chemistry, and bioinformatics, your duties will be the following one.

  • Build an algorithm based on AlphaFold2 source code to generate embedding representation of protein kinases active sites.

  • Testing potential other algorithms as RosettaFold

  • Identity potential candidates for off-target in a case study

  • Modeling of the active site and the interaction of small molecules

Topic: Large Language Models for Drug Discovery: Your AI Biologist Assistant

Missions & activities of the internship:

Under supervision of a Senior Data Scientist holding PhD title and an interdisciplinary background in artificial intelligence, immunology, mathematics, genetics, genomics, and bioinformatics, your duties will be the following one.

  • Evaluate the state-of-the-art per language model and fine-tuning of LLM. Starting from the obtained baseline, we want to refine the LLM by implementing the latest technologies. We will explore strategies for tailoring a model focused on drug discovery, target, and disease knowledge.

  • Deployment of the model. Establish a pipeline for monitoring and deployment of the model (running on a server or alternatives)

  • Biology database integration. Integrate the model with a wide range of biological databases (mutations, pathways, patient data, and so on), so that it can talk to them and extract information from them, and therefore perform complex tasks.

  • Git code repositories with well-documented scripts in Python and notebooks with any conducted analysis.

  • A report summarizing your findings and contributions.

Student expected background/Knowledge

M2 student or last year in Engineer School with educational background in a relevant field (Computational biology, bioinformatics, artificial intelligence or related).
Essential skills include programming, machine learning, understanding of key concepts of molecular biology. Familiarity with NLP and LLMs is a significant advantage. Fluent in French & English languages.

How apply?

Contact: Thierry Billoué – Chief Human Resources Officer – Oncodesign Precision Medicine
Send your application (resume & cover letter) under ref “LLM4Molins” to tbilloue@oncodesign.com
Candidates will be ranked based on their CV and cover letter.
Best candidates will be invited to an interview on site or through Teams.