LSML 23: Large Scale Machine Learning
PSL week Spring Course 2023
MINES28 Large-Scale Machine Learning
March 6th-10th, 2023
Mines Paris, 60 boulevard Saint-Michel, 75006 Paris, Room L.316
This course is co-organized by Fabien Moutarde (Center for Robotics, MINES Paris) and Chloé-Agathe Azencott (MINES Paris & Institut Curie).
Outline
Machine learning is a fast-growing field at the interface of mathematics, computer science and engineering, which provides computers with the ability to learn without being explicitly programmed, in order to make predictions or take rational actions. From cancer research to finance, natural language processing, marketing or self-driving cars, many fields are nowadays impacted by recent progress in machine learning algorithms that benefit from the ability to collect huge amounts of data and "learn" from them.
The goal of this intensive 5-day advanced course is to present the theoretical foundations and practical algorithms to implement and solve large-scale machine learning and data mining problems, and to expose the students to current applications and challenges of "big data" in science and industry.
Prerequisites:
- Numerical Python (ie familiarity with programming in Python and the numpy, scipy, matplotlib librairies).
- Basics of machine learning (such as the content of the Apprentissage Artificiel course for MINES ParisTech students).
Schedule
Practical sessions are open only to officially enrolled PSL students taking the course for credit, and to registered PhD students.
Monday, March 6th, 2023
- 09:00 – 12:15 Lecture: Introduction to large-scale ML & optimization (Adeline FERMANIAN)
- 13:45 – 17:00 Practical session: ML on large data with scikit-learn; this session will also contain an introduction to scikit-learn for those who have not used the library before.
Tuesday, March 7th, 2023
- 09:00 – 12:15 Lecture: Deep Unsupervised Learning, and generative models (Bruno SAUVALLE, Centre de Robotique, MinesParis)
- 13:45 – 17:00 Practical session: Deep Learning, AutoEncoders and GANs with Python
Wednesday, March 8th, 2023
- 09:00 – 12:15 Lecture: Deep reinforcement learning (Fabien MOUTARDE, Centre de Robotique, MinesParis)
- 13:45 – 17:00 Practical session: Deep reinforcement learning with Python
Thursday, March 9th, 2023
- 09:00 – 12:15 Lecture: High Performance Artificial Intelligence (Claude TADONKI + Fabien COELHO, CRI, MinesParis)
- 13:45 – 17:00 Practical session: Stochastic Gradient Descent (unrelated to the morning lecture)
Friday, March 10th, 2023
- 09:00 – 12:15 Lecture: Natural Language Processing (NLP) with Recurrent Neural Networks and Transformers (Adeline FERMANIAN)
- 13:45 – 17:00 Practical session on NLP: embeddings and RNN
Registration
PSL students must enroll officially through their institutions.
Mines Paris students and staff are welcome to attend the lectures remotely by connecting to the Zoom of room L.316.
PhD students who want to participate may email Fabien Moutarde to register and receive a certificate of attendance. These students will also be allowed to attend the practical sessions.
Most course materials will be in English, but some lectures will be given in French.
Grading
If you are taking this class for ECTS credits, you will be ask to turn in (on the Moodle) the notebooks of ALL your practical sessions.
Total credits: 2 ECTS.
Practical sessions
Practical sessions will take the form of Jupyter notebooks on the course github repo.
Please follow the instructions there to install Python3 and all the relevant packages. An alternative (sometimes preferable for deep learning notebooks) is to use Google Colab, for which you will need a Google account.
For students needing to be graded for credits, results of your work during practical sessions (in form of your final notebook) MUST be uploaded on the corresponding "assignments" on the course Moodle at https://moodle.psl.eu/course/view.php?id=17976.Teaching Assistants (supervising practical sessions): Simon de Moreau (Monday+Tuesday+Thursday+Friday), Angelika Ando (Monday+Tuesday?+Friday), Jesus Bujalance Martin (DRL session on Wednesday) [All of them are PhD students at the Center for Robotics of MinesParis]
Textbook and slides
Slides of the course will be made available on the Moodle of the course and hereThere is no single textbook for this course, but the following resources are relevant:
- Mining of massive datasets by Leskovec, Rajaraman and Ullman;
- Deep learning by Goodfellow, Bengio and Courville;
- Large-Scale Optimization: Beyond Stochastic Gradient Descent and Convexity by Sra and Bach.
This course is not an introductory course to machine learning! If you want to learn the basics, or need a refresher, we recommend:
- In French, the lectures of the Parcours Data Scientist sur OpenClassrooms (vidéos et textes en accès libre);
- In French, Introduction au Machine Learning. Chloé-Agathe Azencott, Collection InfoSup, Dunod, 2022;
- In French, Apprentissage statistique supervisé by Fabien Moutarde in Techniques de l'Ingénieur;
- In English, Machine learning by Andrew Ng on Coursera;
- In English, The elements of statistical learning by Hastie, Tibshirani and Friedman;
- In English, Pattern recognition and machine learning by Bishop.