Opening

Martin Wainwright Berkeley, USA

Plenary speakers:

John Aston Cambridge, UK

Gerda Claeskens Leuven, Belgium

Alison Etheridge Oxford, UK

Hannu Oja Turku, Finland

Forum

Mark Girolami Warwick, UK

Closing

Yann LeCun New York, USA

Plenary Lectures

> <
  • Professor Mark Girolami , Imperial College London:

    Diffusions and dynamics on statistical manifolds for statistical inference.

    Abstract. The use of Differential Geometry in Statistical Science dates back to the early work of C.R.Rao in the 1940s when he sought to assess the natural distance between population distributions. The Fisher-Rao metric tensor defined the Riemannian manifold structure of probability measures and from this local manifold geodesic distances between probability measures could be properly defined. This early work was then taken up by many authors within the statistical sciences with an emphasis on the study of the efficiency of statistical estimators. The area of Information Geometry has developed substantially and has had major impact in areas of applied statistics such as Machine Learning and Statistical Signal Processing. A different perspective on the Riemannian structure of statistical manifolds can be taken to make breakthroughs in the contemporary statistical modelling problems. Langevin diffusions and Hamiltonian dynamics on the manifold of probability measures are defined to obtain Markov tran- sition kernels for Monte Carlo based inference.

  • Professor Gerda Claeskens ,KU Leuven:

    Effects of model selection and weight choice on inference.

    Abstract. Weights may be introduced in the estimation process in several ways: estimators may be weighted by zero/one weights in a model selection procedure such that only a ’selected’ estimator is kept for further consideration; weighted estimators may employ more general weights, which can be optimised in some fashion; or weights can be introduced during the estimation stage, resulting in so-called composite estimators which minimise a weighted loss function. Several such estimation strategies are discussed and compared. In general, the randomness of the weights makes inference challenging. For some special cases, including random 0/1 weights from selection by Akaike’s information criterion, it is possible to construct asymptotic confidence regions which are uniformly valid and which incorporate the selection uncertainty.

  • Professor Alexander Holevo , Steklov Mathematical Institute:

    Quantum Shannon Theory.

    Abstract. The notions of channel and information capacity are central to the classical Shannon theory. Quantum Shannon theory is a mathematical discipline which uses operator and matrix analysis and various asymptotic techniques to study the laws for information processing in the systems obeying rules of quantum physics. From the mathematical point of view quantum channels are normalized completely positive maps of operator algebras, the analog of Markov maps in the noncommutative probability theory, playing a role of morphisms in the category of quantum systems. This talk presents basic coding theorems providing analytical expressions for the capacities of quantum channels in terms of various entropic quantities. The remarkable role of specific quantum correlations entanglement as a novel communication resource, is stressed. We report on solution of exciting mathematical problems, such as ”Gaussian optimizers”, concerning computation of the entropic quantities for both theoretically and practically important class of Bosonic Gaussian channels.

  • Professor Yann LeCun , Facebook AI Research & New York University:

    Deep learning: A statistical puzzle.

    Abstract. Deep learning is at the root of revolutionary progress in visual and auditory perception by computers, and is pushing the state of the art in natural language understanding, dialog systems and language translation. Deep learning systems are deployed everywhere from self-driving cars to social networks content filtering to search engines ranking and medical image analysis. A deep learning system is typically an ”almost” differentiable function, composed of multiple highly non- linear steps, parametrized by a numerical vector with 10 7 to 10 9 dimensions, and whose evaluation of one sample requires 10 9 to 10 1 0 numerical operations. Training such a system consists in opti- mizing a highly non-convex objective averaged over millions of training samples using a stochastic gradient optimization procedure. How can that possibly work? The fact that it does work very well is one of the theoretical puzzles of deep learning.

  • Professor Martin Wainwright , University of University of California at Berkeley:

    Pairwise ranking and crowd-sourcing: Statistical models and computational challenges (with Nihar Shah, Sivaraman Balakrishnan and Aditya Guntuboyina).

    Abstract. Many modern data sets take the form of pairwise comparisons, in which binary judgements are made about pairs of items. Some examples include the outcomes of matches between tennis players, ratings of the relevance of search queries, and the outputs of crowd-sourcing engines. We discuss some statistical models for modeling data of this type, along with the computational challenges that arise in performing estimation and rank aggregation with such models.

  • Professor Alison Etheridge, University of Oxford:

    Modelling evolution in a spatial continuum

    Abstract. Since the pioneering work of Fisher, Haldane and Wright at the beginning of the 20th Century, mathematics has played a central role in theoretical population genetics. In turn, population genetics has provided the motivation both for important classes of probabilistic models, such as coalescent processes, and for deterministic models, such as the celebrated Fisher-KPP equation. Whereas coalescent models capture ’relatedness’ between genes, the Fisher-KPP equation captures something of the interaction between natural selection and spatial structure. What has proved to be remarkably difficult is to combine the two, at least in the biologically relevant setting of a two-dimensional spatial continuum. In this talk we describe some of the challenges of modelling evolution in a spatial continuum, present a model that addresses those challenges, and, as time permits, describe some applications.

  • Professor Hannu Oja , University of Turku:

    Scatter matrices and linear dimension reduction (with Klaus Nordhausen , David E. Tyler and Joni Virta)

    Abstract. Most linear dimension reduction methods proposed in the literature can be formulated using a relevant pair of scatter matrices, see e.g. Tyler et al. (2009), Bura and Yang (2011) and Liski et al. (2014). The eigenvalues and eigenvectors of one scatter matrix with respect to another one can be used to determine the dimension of the signal subspace as well as the projection to this subspace. In this talk, three classical dimension reduction methods, namely principal component analysis (PCA), fourth order blind identification (FOBI) and sliced inverse regression (SIR) are considered in detail. The first two moments of subsets of the eigenvalues are used to test for the dimension of the signal space. The limiting null distributions of the test statistics are given and bootstrap strategies are suggested for small sample sizes. The theory is illustrated with simulations and real data examples. The talk is in part based on Nordhausen et al. (2017)

  • Professor John Aston , University of Cambridge:

    Functional object data analysis

    Abstract. Functional Data Analysis has grown into a mature area of statistics over the last 20 years or so, but it is still predominantly based on the notion that the data are one dimensional i.i.d. curves belonging to some smooth Euclidean-like space. However, there have been many recent examples arising from the explosion of data being recorded in science and medicine that do not conform to these notions. Functional Object Data Analysis looks at the situation where the objects are functional-like, in that they are well represented in infinite dimensional spaces, but where there are other considerations such as geometry or higher dimensionality. We will examine cases where the data is multidimensional, where it no longer lives in a Euclidean space and where the objects are related such as in space or time. Including the data’s intrinsic constraints can profoundly enhance the analysis. Examples from Linguistics, Image Analysis and Forensics will help illustrate the ideas.