Big Data in the Social Sciences- Statistical methods for multi-source high-dimensional data

Katrijn Van Deun (Tilburg University, the Netherlands)
October 06, 2017 — 10:00 — "Salle du conseil du L2S"

Abstract

Research in the behavioural and social sciences has entered the era of big data: Many detailed measurements are taken and multiple sources of information are used to unravel complex multivariate relations. For example, in studying obesity as the outcome of environmental and genetic influences, researchers increasingly collect survey, dietary, biomarker and genetic data from the same individuals. Although linked more-variables-than-samples (called high-dimensional) multi-source data form an extremely rich resource for research, extracting meaningful and integrated information is challenging and not appropriately addressed by current statistical methods. A first problem is that relevant information is hidden in a bulk of irrelevant variables with a high risk of finding incidental associations. Second, the sources are often very heterogeneous, which may obscure apparent links between the shared mechanisms. In this presentation we will discuss the challenges associated to the analysis of large scale multi-source data and present state-of-the-art statistical approaches to address the challenges.