Parisian Master of Research in Computer Science
Master Parisien de Recherche en Informatique (MPRI)

Graph Mining (24h, 3 ECTS, period 2)

Course director: Mauro Sozio

Teachers for 2018-2019

Mauro Sozio (MC1 Telecom Paristech, LTCI) sozio at telecom dot paristech dot fr
Laurent Viennot (DR INRIA, IRIF, Paris Diderot)

Scientific and pedagogical content

Graphs provide a powerful abstraction for representing a wide variety of real-world information, such as social networks, knowledge and information networks, biological networks, etc. The main objective of the course is to present the models and algorithms for discovering structures in large graphs, while covering the main theoretical and practical aspects of graph mining. In particular, algorithms with theoretical guarantees as well as algorithms that are proven to work well in practice will be discussed. We will cover the following main topics:

  • finding cliques and dense subgraphs (sequential, parallel/distributed, dynamic algorithms)
  • graph generation models (Erdös-Rényi, preferential attachment, small world phenomenon)
  • community and event detection (Louvain algorithm, event detection via densest subgraph computation, impossibility theorem for clustering)
  • graph decompositions (k-core, modular, local decompositions)
  • diffusion and influence maximization
  • ranking (PageRank, HITS)
  • Mining road networks (pre-computation for shortest path queries, distance labeling and highway dimension, radius and diameter heuristics)
  • Graph decomposition methods and algorithms. Survey of exact algorithms for modular decomposition and some other polynomial graph decompositions and approximation methods for NP-hard decompositions such as treewidth or treelength.

Content and schedule

Exercises and Research Questions

A list of exercises follows. Similar or the very same exercises might be asked at the exam. Exercises

Default Project: Streaming k-Center Clustering with Outliers

The text of the project can be found here Text of the Project

The dataset we are going to use consists of a collection of 1M tweets where only the timestamp and the geographic location have been retained: Dataset (read the readme file which contains some information about the file format and other valuable info)

The deadline for the project is February 15th 2019.

News and Announcements

16/12 A list of exercises has been posted

14/12 the text of the project has been posted

12/12 all the slides so far have been posted

23/01 all the slides so far have been posted

07/02 all the slides so far have been posted

07/02 added 2 exercises (on fastest shortest path computation)

Teaching language

6/8 of classes will be in English, the rest English or French.

Evaluation

During the course a few exercises related to the content of the course will be given. A subset of those exercises (or similar ones) will be asked during the final exam. Material presented during the lectures might also be asked at the final exam.

There is an optional practical project which might give additionally up to 20% of the maximum grade. The final exam contributes to 80% of the maximum grade. The project is optional, in the sense that it is not required to pass the course. It typically consists of implementing efficiently a graph algorithm or a data analysis task. Students should preferably use C++, Java, or Python, however, any programming language can be used.

Students are not allowed to bring any notes at the final exam.

Prerequisites

Good knowledge of algorithms and complexity theory. Good knowledge of Java, C++ or Python are preferable but not necessary.

Related Books

Algorithm Design par J. Kleinberg et E. Tardos, Addison Welsey 2005.

D. Easley, J. Kleinberg. Networks, Crowds, and Markets: Reasoning About a Highly Connected World. Cambridge University Press, 2010.

Related courses

 
Universités partenaires Université Paris-Diderot
Université Paris-Saclay
ENS Cachan École polytechnique Télécom ParisTech
ENS
Établissements associés Université Pierre-et-Marie-Curie CNRS INRIA CEA