In some applications, the environment may be so complex that it is unfeasible to choose a simple stochastic model and use classical statistical theory. A classic example is the spam detection which can be seen as a game between spammer and spam filters. Each trying to fool the other one. There is a necessity to take a robust approach by learning as ones goes along from experiences as more aspects of the problem are observed. This is the goal of online learning.

In online learning, data are acquired and treated on the fly; feedbacks are received and algorithms uploaded on the fly. This field has received a lot of attention recently because of the possible applications coming from internet. They include choosing which ads to display, repeated auctions, spam detection, experts/algorithm aggregation (and boosting), etc. The objectives of the course is to introduce and study the main concepts of online learning and design algorithms with theoretical analysis.

**Prerequisite:** probability theory (notion of random variables, convergence of random variables, conditional expectation).

#### Evaluation

#### Homework

#### Reading material

In online learning, data are acquired and treated on the fly; feedbacks are received and algorithms uploaded on the fly. This field has received a lot of attention recently because of the possible applications coming from internet. They include choosing which ads to display, repeated auctions, spam detection, experts/algorithm aggregation (and boosting), etc. The objectives of the course is to introduce and study the main concepts of online learning and design algorithms with theoretical analysis.

This class is part of the Master 2 MVA. It will last 18 hours (3x6 lectures) + 2h for the exam.

Final grade: approximately 70% final exam, 30% homeworks (to implement some of the algorithms seen in class). A single two-sided sheet of handwritten notes (with any content) will be allowed for the exam.

The homework is due by **Friday, March 17, 2023**. It is to be uploaded using the form here as a single pdf file of maximum size 40Mb. If the upload does not succeed (for some reason), send it by email to and but only after you have tried the upload.

It can be done alone or in groups of two students. The code can be done in any langage (python, R, matlab,...). The code file (e.g., notebook python, R) does not need to be delivered but the results and the figures must be included into the pdf report. The report can be written in English or in French.

- Prediction, learning, and games. N. Cesa-Bianchi and G. Lugosi, 2006.
- Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems. S. Bubeck and N. Cesa-Bianchi, 2012.
- Bandit algorithms. T. Lattimore and Csaba Szepesvári, 2021.
- Introduction to online convex optimization. E Hazan, 2016.

Friday mornings from 9h00 to 12h00 at ENS Paris-Saclay. Information will be provided on the website before each class. Lecture notes will be updated here on the fly.

# | Date | Teacher | Where | Title |
---|---|---|---|---|

1 | 13/01/2023 | PG | 1B18 | Introduction. Sequential learning with individual sequences. Learning from expert advice. |

2 | 20/01/2023 | PG | Visio | Online convex optimization. Exponentiated Gradient. Online gradient descent. |

3 | 27/01/2023 | PG | 1B18 | Adversarial Bandits. |

4 | 03/02/2023 | RD | 1B18 | Stochastic Bandits 1. Basic algorithms: Explore-Then-Commit, Upper Confidence Bound, ε-greedy. |

5 | 10/02/2023 | RD | 1B18 | Stochastic Bandits 2. Linear and continuous bandits. |

6 | 17/02/2023 | RD | 1B18 | Stochastic Bandits 3. Lower bounds. Best arm identification. |

17/03/2022 | 1B18 | Exam (from 10:00 to 12:00). It will be in person at ENS Paris-Saclay. Previous exams: 2020, 2021, 2022 |