Parisian Master of Research in Computer Science
Master Parisien de Recherche en Informatique (MPRI)

Algorithmics and bioinformatics (26h, 3 ECTS)

Coordination: Philippe Gambette (LIGM, Univ. Paris-Est Marne-la-Vallée).

Lecturers

The lecturers are researchers in the Algorithmics for Bioinformatics group at LIGM (Univ. Gustave Eiffel at Marne-la-Vallée).

For year 2021-2022, the lecturers will be Gregory Kutcherov and Philippe Gambette.

For year 2020-2021, the lecturers were Laurent Bulteau and Mathias Weller.

Practical information

Language

Lectures will be given in English.

Location

The lectures will take place at ENS Paris-Saclay.

Motivations and main objectives

The objective of this course is to study the algorithmic approaches used in bioinformatics. The topics covered in 2020-2021 include scaffolding, reversal and transposition distances, scaffold filling and phylogenetics.

Short description

A total of 26 hours of lectures and exercise sessions, from September 21st to November 9th (to be confirmed):

  • Gregory Kucherov: sequence analysis and data structures for bioinformatics)
  • Philippe Gambette (algorithmics for phylogenetics) :
    • phylogenetic reconstruction: distance methods, maximum parsimony, maximum likelihood ; rooting
    • tree combinatorics: compatible characters, clusters, splits
    • tree comparison: maximum agreement subtree, tree distances (Robinson-Foulds, quartet and triplet distance, NNI, TBR, SPR)
    • reticulate evolution: phylogenetic network, hybridization number, tree containment problem

Lecture notes

Evaluation / exams

The evaluation of the students will consist in a written exam (coefficient 1), as well as a coding project (optional, coefficient 1).

Written exam

The written exam will be on November 9th16th. The exercises will be given in English, and you can write your answers in English or in French. You'll be authorized to have your lecture notes with you (but no computer, cellphone, etc.)

Coding project

Students will also be in charge of programming a solution to a problem occurring in bioinformatic algorithms, in Python. More precisely, the coding project consists in programming an efficient algorithm to solve multiple lowest common ancestor queries on a rooted tree.

The project will be coded in Python, in only one Python file, using only the following libraries: glob, numpy, os, re, sys and time. If you would like to use other libraries, or for any question about the coding project, please contact Philippe Gambette (philippe.gambette@univ-eiffel.fr) to ask whether an extra library can be added to this list. If you use already existing code, please respect its licence and mention it clearly in the beginning of your file. Commenting your code will be appreciated, for example with approximately one comment every 5 lines.

This Python script will take as input a text file in UTF8 encoding, called input.txt, located in the same folder, containing a rooted tree with labeled nodes coded in the Newick format on the first line, followed by a sequence of lines containing a first leaf label followed by a comma followed by a second leaf label (see this this example) and will output a text file, called output.txt, contained in the same folder, containing on line i the label of the common ancestor of the pairs of nodes of line i+1 in the file input.txt (see this example).

The project will be evaluated taking into account the correctness of your code (both on binary and non binary trees), its speed, its readability (quality of comments in particular) as well as the quantity of code written personally and the respect of intellectual property.

Further indications will be given here on how to send the source code. The deadline for this coding project task will be November 14th21st, at 23:59. The deadline may be extended only by request before the initial deadline at philippe.gambette@univ-eiffel.fr.

Schedule

  • Gregory Kucherov (stringology for bioinformatics) :
    • 2021-09-21, 8:30-12h45
    • 2021-09-28, 8:30-12h45
    • 2021-10-05, 8:30-12h45
    • 2021-10-12, 8:30-12h45
  • Philippe Gambette (algorithmics for phylogenetics) :
    • 2021-10-19, 8:30-12h45
    • 2021-10-26, 8:30-12h45
    • 2021-11-09, 8:30-12h45
    • 2021-11-16, 8:30-10h30: lecture, 10h45-12h45: exam

Prerequisites

  • basic algorithmics and computational complexity

Related courses

Bibliography

 
Universités partenaires Université Paris-Diderot
Université Paris-Saclay
ENS Cachan École polytechnique Télécom ParisTech
ENS
Établissements associés Université Pierre-et-Marie-Curie CNRS INRIA CEA