Topic: "Tabular Learning: skrub and Foundation Models"
Speaker: Gaël Varoquaux, PhD / Research Director | scikit-learn Author | Co-Founder / Inria | Probabl
Varoquaux's research covers fundamentals of artificial intelligence, statistical learning, natural language processing, causal inference, as well as applications to health, with a current focus on public health and epidemiology. He also creates technology: he co-funded scikit-learn, one of the reference machine-learning toolboxes, and helped build various central tools for data analysis in Python.
Varoquaux has worked at UC Berkeley, McGill, and university of Florence. He did a PhD in quantum physics supervised by Alain Aspect and is a graduate from Ecole Normale Superieure, Paris.
Abstract:
While tabular data is central to all organizations, it seems left out of the AI discussion, which has focused on images, text and sound. Ineed, for data science, most of the excitement is in machine learning, but most of the work happens before. Tables often require extensive manual transformation or "data wrangling".
Gaël will discuss how they progressively rethought this process, building machine learning tool that require less wrangling. We are building a new library, skrub (https://skrub-data.org), that facilitates complex tabular-learning pipelines, writing as much as possible wrangling as high-level operations and automating them. A few lines of skrub can spare you dozens of wrangling lines!
But can more bleeding-edge research bring the revolution of fondation models? Our latest breakthrough, the CARTE model, shows how pretrained models can bring value to downstream table analytics without manually transforming these tables to please the model; maybe the begining of a large tabular model revolution.