Deep learning theory through the lens of diagonal linear networks

Scott Pesme (INRIA Grenoble)
November 28, 2025 — 11:00 — "new L2S location (IBM building), Room Hopper (Third floor)" (and Teams)

Abstract

Surprisingly, many optimisation phenomena observed in complex neural networks also appear in so-called 2-layer diagonal linear networks. This rudimentary architecture—a two-layer feedforward linear network with a diagonal inner weight matrix—has the advantage of revealing key training characteristics while keeping the theoretical analysis clean and insightful. In this talk, I’ll provide an overview of various theoretical results for this architecture, while drawing connections to experimental observations from practical neural networks. Specifically, we’ll examine how hyperparameters such as the initialisation scale, step size, and batch size impact the optimisation trajectory and influence the generalisation performance of the recovered solution.

Bio

I am a postdoctoral researcher at Inria Grenoble, working with Julien Mairal. I obtained my PhD in 2024 at EPFL under the supervision of Nicolas Flammarion. My work focuses on understanding why neural networks are so good at what they do. How is it that we can train them using gradient-based methods? Why do these trained networks generalise so well to new data? How, and why, does a neural network make a certain decision? And so on.