Parisian Master of Research in Computer Science
Master Parisien de Recherche en Informatique (MPRI)

Logical and Computational Structures for Linguistic Modeling

Structures Informatiques et Logiques pour la Modélisation Linguistique, (24h, 3ECTS). Taught in round-robin by Benoît Crabbé (Paris Diderot), Philippe de Groote (INRIA Lorraine), and Sylvain Schmitz (ENS Paris-Saclay).

Year 2019–2020

Teaching Staff

This year, the first part of the course is taught by Benoît Crabbé (Paris Diderot). The second half is taught by Philippe de Groote (INRIA Lorraine).

Schedule

The class takes place on thursdays from 12.45 to 15.45. in room 1014

Language

To be decided with the students. The course material is in English. Students will be allowed to take the exam in French or in English.

Description

Computational linguistics employs mathematical models to represent morphological, syntactic, and semantic structures in natural languages. The course introduces several such models while insisting on their underlying logical structure and algorithmics. Quite often these models will be related to mathematical objects studied in other MPRI courses, for which this course provides an original set of applications and problems.

The course is not a substitute for a full cursus in computational linguistics; it rather aims at providing students with a rigorous formal background in the spirit of MPRI. Most of the emphasis is put on the symbolic treatment of words, sentences, and discourse. Several fields within computational linguistics are not covered, prominently speech processing and pragmatics. Machine learning techniques are only very sparsely treated; for instance we focus on the mathematical objects obtained through statistical and corpus-based methods (i.e. weighted automata and grammars) and the associated algorithms, rather than on automated learning techniques (which is the subject of course 1.30).

Tentative Outline

We sketch here the planned contents for 2019–2020. These contents are structured around three important subdomains of linguistics, (morphology, syntax, and semantics), presenting on each occasion some of the related models and the corresponding algorithmic issues. The exact dates and content might change.

  1. September 12th, 2019
    • General Introduction Language has structure. Language and inference. The importance of ambiguity. Language and the world.
    • Linguistics basics for computational linguistics. Statistical properties of words, constituent and dependency analyses, computing semantic denotations and computing semantic similarities.
    • Machine learning basics for computational linguistics. Coding discrete symbols as vectors (word embeddings), optimisation reminder.
  2. September 19th, 2019
    • Modelling sequences Presentation of typical problems involving sequence modelling.
    • Generative models language models, hidden markov models, PCFG
    • Discriminative models conditional random fields
    • Algorithms Viterbi and approximative methods
    • Deep learning based methods
  3. September 26th, 2019 Modelling syntax
    • Phrase structure grammar
    • Tree adjoining Grammar
    • Dependency syntax
    • Categorial grammar
  4. October 3rd, 2019 Parsing algorithms for natural language
    • CKY and Earley Introduction to weighted CKY and Earley
    • Shift Reduce and Eisner for Dependency syntax
    • CKY for tree adjoining grammar
  5. October 17th, 2019
    • Semantic Representations modal logics, higher-order logics
  6. October 24th, 2019
    • Syntax/Semantics Interface compositionality, higher-order syntax, abstract categorial grammars
  7. November 7th, 2019
    • Montague Semantics model-theoretic semantics, intensionality
  8. November 14th, 2019 Discourse Analysis discourse representation theory, anaphora resolution, type-theoretic dynamic logic
  9. November 21st, 2019
    • exam

Course Material

2019–2020
  • Readings choose two blocks out of 3:
  • Distributional and Vector semantics :
    1. Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean (2013), Efficient Estimation of Word Representations in Vector Space, NIPS 2013 (link)
    2. Richard Socher, Alex Perelygin, Jean Y. Wu, Jason Chuang, Christopher D. Manning, Andrew Y. Ng and Christopher Potts (2013), Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank, EMNLP 2013 (link)
  • Mildly Context sensitive languages :
    1. Aravind Joshi (1985), How much context sensitivity is required to provide reasonable structural descriptions ? (link)
    2. Alexander Clark (2015), An introduction to multiple context free grammars for linguists (link)
  • Semantic parsing with distant supervision :
    1. Luke S. Zettlemoyer and Michael Collins (2009), Learning to Map Sentences to Logical Form: Structured Classification with Probabilistic Categorial Grammars (link)
    2. Jonathan Berant Andrew Chou Roy Frostig Percy Liang (2013) Semantic Parsing on Freebase from Question-Answer Pairs, EMNLP 2013. (link)
2018–2019
2017–2018
2016–2017
2015–2016
2014–2015
2013–2014
2012–2013
2011–2012
Older Material

To Know More

Requisites

  • Basics in formal language theory (regular word languages, sequential functions, context-free languages, regular tree languages)
  • Elementary notions in logics
  • Some fluency in lambda calculus

Related Courses

  • 1.18 Tree Automata and Applications: regular tree languages, monadic second-order logic on trees, potentially pushdown tree languages,
  • 1.30 Machine Learning: as already mentioned, this course does not cover learning techniques
  • 1.24 Probabilistic Aspects of Computer Science: Markov chains,
  • 2.16 Modélisation par automates finis: rational relations and rational series.

References

  • Jean Berstel. Transductions and Context-Free Languages, Teubner Studienbücher: Informatik, Teubner, 1979. webpage
  • Jacques Sakarovitch. Elements of Automata Theory, Cambridge University Press, 2009. Translated from Éléments de théorie des automates, Vuibert Informatique, 2003.
  • Hubert Comon, Max Dauchet, Rémi Gilleron, Christof Löding, Florent Jacquemard, Denis Lugiez, Sophie Tison, and Marc Tommasi. Tree Automata Techniques and Applications, 2007. webpage
  • Daniel Jurafsky and James H. Martin. Speech and Language Processing, Prentice Hall Series in Artificial Intelligence, Prentice Hall, second edition, 2009.
  • Mitkov, ed. The Oxford handbook of computational linguistics, Oxford University Press, 2003.
  • Jackendoff, Ray. Foundations of language: brain, meaning, grammar evolution, Oxford University Press, 2002.
  • Bob Carpenter. Type-Logical Semantics, MIT Press. 1998.
  • Johan van Benthem and Alice ter Meulen, eds. Handbook of Logic and Language, Elsevier Science, 1997.
  • Patrick Blackburn and Johan Bos. Representation and Inference for Natural Language, A First Course in Computational Semantics, CSLI, 2005.
  • Christian Retoré. The Logic of Categorial Grammars: Lecture Notes. webpage
  • Shuly Wintner, Nissim Francez, Unification Grammars, Cambridge University Press 2012.
 
Universités partenaires Université Paris-Diderot
Université Paris-Saclay
ENS Cachan École polytechnique Télécom ParisTech
ENS
Établissements associés Université Pierre-et-Marie-Curie CNRS INRIA CEA