Sequential learning (2024

Course description

In some applications, the environment may be so complex that it is unfeasible to choose a simple stochastic model and use classical statistical theory. A classic example is the spam detection which can be seen as a game between spammer and spam filters. Each trying to fool the other one. There is a necessity to take a robust approach by learning as ones goes along from experiences as more aspects of the problem are observed. This is the goal of online learning.

In online learning, data are acquired and treated on the fly; feedbacks are received and algorithms uploaded on the fly. This field has received a lot of attention recently because of the possible applications coming from internet. They include choosing which ads to display, repeated auctions, spam detection, experts/algorithm aggregation (and boosting), etc. The objectives of the course is to introduce and study the main concepts of online learning and design algorithms with theoretical analysis.

Prerequisite: probability theory (notion of random variables, convergence of random variables, conditional expectation).

Evaluation

This class is part of the Master 2 MVA. It will last 18 hours (3x6 lectures) + 2h for the exam.
Final grade: approximately 70% final exam, 30% homeworks (to implement some of the algorithms seen in class).

Exam

It will be on March 21st, 10h-12h. A single two-sided sheet of handwritten notes (with any content) will be allowed for the exam. The exam will be divided into two separate parts (online convex optimization and stochastic bandits), each of which should be returned on a separate sheet. Please bring your own sheet of paper for the exam.

Homework

The homework is due by Friday, March 21, 2025. It is to be uploaded using the form here as a single jupyter notebook file (Part 1), together with a single pdf report (Part 2). If the upload does not succeed (for some reason), send it by email to and but only after you have tried the upload.

It can be done alone or in groups of two students. The report can be written in English or in French.

Reading material

Prediction, learning, and games. N. Cesa-Bianchi and G. Lugosi, 2006.
Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems. S. Bubeck and N. Cesa-Bianchi, 2012.
Bandit algorithms. T. Lattimore and Csaba Szepesvári, 2021.
Introduction to online convex optimization. E. Hazan, 2016.
A Modern Introduction to Online Learning. F. Orabona, 2019.

Schedule

Friday mornings from 9h00 to 12h00 at ENS Paris-Saclay. Information will be provided on the website before each class.

#	Date	Teacher	Where	Title
1	17/01/2025	PG	1Z28	Introduction. Online convex optimization (slides 1-57). Slides Lecture notes
2	24/01/2025	PG	1Z28	Online convex optimization (slides 58-79). Slides Lecture notes
3	31/01/2025	RD	1Z28	Stochastic Bandits 1. Basic algorithms: Explore-Then-Commit, Upper Confidence Bound, ε-greedy. Slides Lecture notes
4	07/02/2025	RD	1Z28	Stochastic Bandits 2. Linear and continuous bandits. Slides Lecture notes
5	14/02/2025	PG	1Z28	Adversarial Bandits (slides 80-108). Slides Lecture notes
6	21/02/2025	RD	1Z28	Stochastic Bandits 3. Lower bounds. Best arm identification. Slides Lecture notes
	21/03/2025		1Z28	Exam (from 10:00 to 12:00). In person at ENS Paris-Saclay. Previous exams: 2020, 2021, 2022 (corr), 2023, 2024 (corr), 2025 (corr)

Sequential Learning

Course description

Evaluation

Exam

Homework

Reading material

Schedule

Instructors

Pierre Gaillard

Rémy Degenne