Making the Most of Sparse Data: Machine Learning and Data Assimilation with applications in Air Quality and Geoscience
Sibo Cheng (CEREA, École Nationale des Ponts et Chaussées (ENPC), Institut Polytechnique de Paris)
February 21, 2025 — 10:00 — "new L2S location (IBM building), 3rd floor, Salle G. Hopper" (and Teams)
Abstract
This talk explores integrating data assimilation (DA) and machine learning techniques to predict and reconstruct multi-variable high-dimensional environmental fields, such as air quality and geophysical data, from sparse and movable sensor observations. Traditional DA methods, while effective, face challenges like computational cost and error covariance estimation. AI-driven latent data assimilation addresses these by compressing high-dimensional data into a latent space using autoencoders, enabling efficient corrections thanks to the auto-differentiation of neural networks. Supervised learning with advanced CNNs or masked autoencoder complements this by directly reconstructing fields from sparse, unstructured data. Applications include correcting NO₂ levels in Ile de France using sparse sensor networks and experiments on the multi-variate weatherbench dataset with randomly placed observation points.
References
Cheng, S., Quilodran-Casas, C., Ouala, S., Farchi, A., Liu, C., Tandeo, P., Fablet, R., Lucor, D., Iooss, B., Brajard, J., Xiao, D., Janjic, T., Ding, W., Guo, Y., Carrassi, A., Bocquet, M. and Arcucci, R., (2023) Machine learning with data assimilation and uncertainty quantification for dynamical systems: a review, IEEE/CAA Journal of Automatica Sinica
Cheng, S., Liu, C., Guo, Y. and Arcucci, R., 2024. Efficient deep data assimilation with sparse observations and time-varying sensors. Journal of Computational Physics
Bio
Sibo Cheng is currently a Junior Professor (Chaire de Professeur Junior) at CEREA, École Nationale des Ponts et Chaussées (ENPC), Institut Polytechnique de Paris in France. His work focuses on machine learning for dynamical systems, reduced-order surrogate models (digital twins), and inverse modeling (parameter calibration and data assimilation) for environmental science and physics, with a wide range of applications including geosciences (wildfire & air pollution) and fluid dynamics. He completed his Ph.D. at LISN, University Paris-Saclay, France, in 2020. From 2020 to 2024, he was a research associate at the Data Science Institute of Imperial College London.