Courses

Information for students starting in October 2024

Year 1

Students are required to select four courses from the courses listed below.

The courses are based at the University of Oxford Mathematical Institute, Department of Computer Science and Department of Statistics.

Oxford Mathematical Institute

Expand All

General Prerequisites:

Good command of Part A Integration, Probability and Differential Equations 1 are essential; the main concepts which will be used are the convergence theorems and the theorems of Fubini and Tonelli, and the notions of measurable functions, integrable functions, null sets and Lpp spaces. The Cauchy-Lipschitz theory and Picard's theorem proofs will be used. Basic knowledge of random variables, laws, expectations, and independence are needed. A good working knowledge of Part A Core Analysis (metric spaces) is expected. Knowledge of B8.1 Probability, Measure and Martingales will certainly help but it is not essential.

Course Term: Hilary

Course Lecture Information: 16 lectures

Course Level: M

Course Overview: 

This course will serve as an introduction to optimal transportation theory, its application in the analysis of PDE, and its connections to the macroscopic description of interacting particle systems.

Learning Outcomes: 

Getting familar with the Monge-Kantorovich problem and transport distances. Derivation of macroscopic models via the mean-field limit and their analysis based on contractivity of transport distances. Dynamic Interpretation and Geodesic convexity. A brief introduction to gradient flows and examples.

Course Synopsis: 
  1. Interacting Particle Systems & PDE (2 hours)
    • Granular Flow Models and McKean-Vlasov Equations.
    • Nonlinear Diffusion and Aggregation-Diffusion Equations.
  2. Optimal Transportation: The metric side (4 hours)
    • Functional Analysis tools: weak convergence of measures. Prokhorov’s Theorem. Direct Method of Calculus of Variations. (1 hour)
    • Monge Problem. Kantorovich Duality. (1.5 hours)
    • Transport distances between measures: properties. The real line. Probabilistic Interpretation: couplings.(1.5 hours)
  3. Mean Field Limit & Couplings (4 hours)
    • Dobrushin approach: derivation of the Aggregation Equation. (1.5 hour)
    • Sznitmann Coupling Method for the McKean-Vlasov equation. (1.5 hour)
    • Boltzmann Equation for Maxwellian molecules: Tanaka Theorem. (1 hour)
  4. Gradient Flows: Aggregation-Diffusion Equations (6 hours)
    • Brenier’s Theorem and Dynamic Interpretation of optimal tranport. Otto’s calculus. (2 hours)
    • McCann’s Displacement Convexity: Internal, Interaction and Confinement Energies. (2 hours)
  5. Gradient Flow approach: Minimizing movements for the (McKean)-Vlasov equation. Properties of the variational scheme. Connection to mean-field limits. (2 hours)
Reading List: 
  1. F. Golse, On the Dynamics of Large Particle Systems in the Mean Field Limit, Lecture Notes in Applied Mathematics and Mechanics 3. Springer, 2016.
  2. L. C. Evans, Weak convergence methods for nonlinear partial differential equations. CBMS Regional Conference Series in Mathematics 74, AMS, 1990.
  3. F. Santambrogio, Optimal Transport for Applied Mathematicians: Calculus of Variations, PDEs, and Modeling, Progress in Nonlinear Differential Equations and Their Applications, Birkhauser 2015.
  4. C. Villani, Topics in Optimal Transportation, AMS Graduate Studies in Mathematics, 2003

Please note that e-book versions of many books in the reading lists can be found on SOLO

Further Reading: 
  1. L. Ambrosio, G. Savare, Handbook of Differential Equations: Evolutionary Equations, Volume 3-1, 2007.
  2. C. Villani, Optimal Transport: Old and New, Springer 2009

More details: Course: C4.9 Optimal Transport & Partial Differential Equations (2023-24) | Mathematical Institute (ox.ac.uk)

 

General Prerequisites: Basic linear algebra (such as eigenvalues and eigenvectors of real matrices), multivariate real analysis (such as norms, inner products, multivariate linear and quadratic functions, basis) and multivariable calculus (such as Taylor expansions, multivariate differentiation, gradients).

Course Term: Hilary

Course Lecture Information: 16 lectures

Course Level: M

Course Overview:

The solution of optimal decision-making and engineering design problems in which the objective and constraints are nonlinear functions of potentially (very) many variables is required on an everyday basis in the commercial and academic worlds. A closely-related subject is the solution of nonlinear systems of equations, also referred to as least-squares or data fitting problems that occur in almost every instance where observations or measurements are available for modelling a continuous process or phenomenon, such as in weather forecasting. The mathematical analysis of such optimization problems and of classical and modern methods for their solution are fundamental for understanding existing software and for developing new techniques for practical optimization problems at hand.

Course Synopsis:

Part 1: Unconstrained Optimization
Optimality conditions, steepest descent method, Newton and quasi-Newton methods, General line search methods, Trust region methods, Least squares problems and methods.

Part 2: Constrained Optimization
Optimality/KKT conditions, penalty and augmented Lagrangian for equality-constrained optimization, interior-point/ barrier methods for inequality constrained optimization. SQP methods.

More details: Course: C6.2 Continuous Optimisation (2023-24) | Mathematical Institute (ox.ac.uk)

General Prerequisites:

Only elementary linear algebra and probability are assumed in this course; with knowledge from the following prelims courses also helpful: linear algebra, probability, analysis, constructive mathematics, and statistics and data analysis. It is recommended that students have familiarity with some of: more advanced statistics, optimisation (B6.3, C6.2), networks (C5.4), and numerical linear algebra (C6.1), though none of these courses are required as the material is self contained.

Course Term: Michaelmas

Course Lecture Information: 16 lectures

Course Weight: 1

Course Level: M

Course Overview:

A course on theories of deep learning.

Learning Outcomes:

Students will become familiar with the variety of architectures for deep nets, including the scattering transform and ingredients such as types of nonlinear transforms, pooling, convolutional structure, and how nets are trained. Students will focus their attention on learning a variety of theoretical perspectives on why deep networks perform as observed, with examples such as: dictionary learning and transferability of early layers, energy decay with depth, Lipschitz continuity of the net, how depth overcomes the curse of dimensionality, constructing adversarial examples, geometry of nets viewed through random matrix theory, and learning of invariance.

Course Synopsis:

Deep learning is the dominant method for machines to perform classification tasks at reliability rates exceeding that of humans, as well as outperforming world champions in games such as go. Alongside the proliferating application of these techniques, the practitioners have developed a good understanding of the properties that make these deep nets effective, such as initial layers learning weights similar to those in dictionary learning, while deeper layers instantiate invariance to transforms such as dilation, rotation, and modest diffeomorphisms. There are now a number of theories being developed to give a mathematical theory to accompany these observations; this course will explore these varying perspectives

More details: Course: C6.5 Theories of Deep Learning (2023-24) | Mathematical Institute (ox.ac.uk)

General Prerequisites:

There are no formal prerequisites, but familiarity with basic concepts and results from linear algebra and probability will be assumed, at the level of A0 (Linear Algebra) and A8 (Probability).

Course Term: Hilary

Course Lecture Information: 16 lectures

Course Level: M

Course Overview:

Random Matrix Theory provides generic tools to analyse random linear systems. It plays a central role in a broad range of disciplines and application areas, including complex networks, data science, finance, machine learning, number theory, population dynamics, and quantum physics. Within Mathematics, it connects with asymptotic analysis, combinatorics, integrable systems, numerical analysis, probability, and stochastic analysis. This course aims to provide an introduction to this highly active, interdisciplinary field of research, covering the foundational concepts, methods, questions, and results.

Learning Outcomes:

Students will learn how some of the various different ensembles of random matrices are defined. They will encounter examples of the applications these have in Data Science, modelling Complex Quantum Systems, Mathematical Finance, Network Models, Numerical Linear Algebra, and Population Dynamics. They will learn how to analyse eigenvalue statistics, and see connections with other areas of mathematics and physics, including combinatorics, number theory, and statistical mechanics.

Course Synopsis: 

Introduction to matrix ensembles, including Wigner and Wishart random matrices, and the Gaussian and Circular Ensembles. Overview of connections with Data Science, Complex Quantum Systems, Mathematical Finance, Network Models, Numerical Linear Algebra, and Population Dynamics (1 Lecture)

Statement and proof of Wigner’s Semicircle Law; statement of Girko’s Circular Law; applications to Population Dynamics (May’s model). (3 lectures)

Statement and proof of the Marchenko-Pastur Law using the Stieltjes and R-transforms; applications to Data Science and Mathematical Finance. (3 lectures)

Derivation of the Joint Eigenvalue Probability Density for the Gaussian and Circular Ensembles;
method of orthogonal polynomials; applications to eigenvalue statistics in the large-matric limit;
behaviour in the bulk and at the edge of the spectrum; universality; applications to Numerical Linear
Algebra and Complex Quantum Systems (5 lectures)

Dyson Brownian Motion (2 lectures)

Connections to other problems in mathematics, including the longest increasing subsequence
problem; distribution of zeros of the Riemann zeta-function; topological genus expansions. (2 lectures)

Reading List: 

  1. ML Mehta, Random Matrices (Elsevier, Pure and Applied Mathematics Series)
  2. GW Anderson, A Guionnet, O Zeitouni, An Introduction to Random Matrices (Cambridge Studies in Advanced Mathematics)
  3. ES Meckes, The Random Matrix Theory of the Classical Compact Groups (Cambridge University Press)
  4. G. Akemann, J. Baik & P. Di Francesco, The Oxford Handbook of Random Matrix Theory (Oxford University Press)
  5. G. Livan, M. Novaes & P. Vivo, Introduction to Random Matrices (Springer Briefs in Mathematical Physics)

Please note that e-book versions of many books in the reading lists can be found on SOLO

Further Reading: 
  1. T. Tao, Topics in Random Matrix Theory (AMS Graduate Studies in Mathematics)

More details: Course: C7.7 Random Matrix Theory (2023-24) | Mathematical Institute (ox.ac.uk)

General Prerequisites:

Integration theory: Riemann-Stieljes and Lebesgue integral and their basic properties
Probability and measure theory: σσ-algebras, Fatou lemma, Borel-Cantelli, Radon-Nikodym, LpLp-spaces, basic properties of random variables and conditional expectation,
Martingales in discrete and continuous time: construction and basic properties of Brownian motion, uniform integrability of stochastic processes, stopping times, filtrations, Doob's theorems (maximal and LpLp-inequalities, optimal stopping, upcrossing, martingale decomposition), martingale (backward) convergence theorem, L2L2-bounded martingales, quadratic variation;
Stochastic Integration: Ito’s construction of stochastic integral, Ito’s formula.

Course Term: Michaelmas

Course Lecture Information: 16 lectures

Course Level: M

Course Overview:

Stochastic differential equations (SDEs) model evolution of systems affected by randomness. They offer a beautiful and powerful mathematical language in analogy to what ordinary differential equations (ODEs) do for deterministic systems. From the modelling point of view, the randomness could be an intrinsic feature of the system or just a way to capture small complex perturbations which are not modelled explicitly. As such, SDEs have found many applications in diverse disciplines such as biology, physics, chemistry and the management of risk.
Classic well-posedness theory for ODEs does not apply to SDEs. However, when we replace the classical Newton-Leibnitz calculus with the (Ito) stochastic calculus, we are able to build a new and complete theory of existence and uniqueness of solutions to SDEs. Ito formula proves to be a powerful tool to solve SDEs. This leads to many new and often surprising insights about quantities that evolve under randomness. This course is an introduction to SDEs. It covers the basic theory but also offers glimpses into many of the advanced and nuanced topics.

Learning Outcomes:

By the end of this course, students will be able to analyse if a given SDEs admits a solution, characterise the nature of solution and explain if it is unique or not. The students will also be able to solve basic SDEs and state basic properties of the diffusive systems described by these equations.

Course Synopsis:

Recap on martingale theory in continuous time, quadratic variation, stochastic integration and Ito's calculus.
Levy's characterisation of Brownian motion, stochastic exponential, Girsanov theorem and change of measure, Burkholder-Davis-Gundy, Martingale representation, Dambis-Dubins-Schwarz.
Strong and weak solutions of stochastic differential equations, existence and uniqueness.
Examples of stochastic differential equations. Bessel processes.
Local times, Tanaka formula, Tanaka-Ito-Meyer formula.

More details: Course: C8.1 Stochastic Differential Equations (2023-24) | Mathematical Institute (ox.ac.uk)

General Prerequisites: Integration and measure theory, martingales in discrete and continuous time, stochastic calculus. Functional analysis is useful but not essential.

Course Term: Hilary

Course Lecture Information: 16 lectures

Course Level: M

Course Overview:

Stochastic analysis and partial differential equations are intricately connected. This is exemplified by the celebrated deep connections between Brownian motion and the classical heat equation, but this is only a very special case of a general phenomenon. We explore some of these connections, illustrating the benefits to both analysis and probability.

Learning Outcomes:

The student will have developed an understanding of the deep connections between concepts from probability theory, especially diffusion processes and their transition semigroups, and partial differential equations.

Course Synopsis:

Feller processes and semigroups. Resolvents and generators. Hille-Yosida Theorem (without proof). Diffusions and elliptic operators, convergence and approximation. Stochastic differential equations and martingale problems. Duality. Speed and scale for one dimensional diffusions. Green's functions as occupation densities. The Dirichlet and Poisson problems. Feynman-Kac formula.

More details: Course: C8.2 Stochastic Analysis and PDEs (2023-24) | Mathematical Institute (ox.ac.uk)

General Prerequisites: Part B Graph Theory and Part A Probability. C8.3 Combinatorics is not as essential prerequisite for this course, though it is a natural companion for it.

Course Term: Hilary

Course Lecture Information: 16 lectures

Course Level: M

Course Overview:

Probabilistic combinatorics is a very active field of mathematics, with connections to other areas such as computer science and statistical physics. Probabilistic methods are essential for the study of random discrete structures and for the analysis of algorithms, but they can also provide a powerful and beautiful approach for answering deterministic questions. The aim of this course is to introduce some fundamental probabilistic tools and present a few applications.

Learning Outcomes:

The student will have developed an appreciation of probabilistic methods in discrete mathematics.

Course Synopsis:

First-moment method, with applications to Ramsey numbers, and to graphs of high girth and high chromatic number.
Second-moment method, threshold functions for random graphs.
Lovász Local Lemma, with applications to two-colourings of hypergraphs, and to Ramsey numbers.
Chernoff bounds, concentration of measure, Janson's inequality.
Branching processes and the phase transition in random graphs.
Clique and chromatic numbers of random graphs.

More details: Course: C8.4 Probabilistic Combinatorics (2023-24) | Mathematical Institute (ox.ac.uk)

General Prerequisites:

None, but some basic knowledge of stochastic calculus (as would be obtained from B8.2 Continuous Martingales and Stochastic Calculus or B8.3 Mathematical Models of Financial Derivatives) will be assumed. Familiarity with applied PDE (as would be obtained from B5.2 Applied Partial Differential Equations, B6.1 Numerical Solution of Partial Differential Equations, or B7.1 Classical Mechanics) would also be beneficial.

Overview:

Optimal control is the question of how one should select actions sequentially through time. The problem appears, in various forms, in many applications, from industrial problems and classical mechanics through to problems in biology and finance. This course will study the mathematics required to understand these problems, both in discrete and continuous time, and in settings with and without randomness. The two main perspectives on control – dynamic programming and the Pontryagin principle – will be explored, along with how these perspectives lead to equations that describe the optimal action. The numerical solution of these equations will also be considered.

Learning Outcomes:

The students will develop an understanding of the classical theory of optimal control, and be able to determine optimal controls within mathematical models. They will be familiar with manipulating the corresponding PDEs, and with reinforcement learning techniques to solve them.

Synopsis:

Dynamic programming in discrete time, the Bellman equation and value function. Iteration methods for discrete systems and variations from reinforcement learning. Continuous deterministic systems and the Hamilton–Jacobi equation. Pontryagin maximum principle for deterministic systems. Feynman–Kac theorem and values in stochastic systems, the LinearQuadratic- Gaussian case. Martingale characterization of optimality and the Hamilton– Jacobi–Bellman equation, maximum principle and verification theorem. Examples from finance and engineering.

Reading List:

There are a large number of textbooks which cover the course material with a varying degree of detail/rigour. Recommended reading includes:

Sutton and Barto Reinforcement Learning: An Introduction, MIT 1998
Bertsekas and Shreve, Stochastic Optimal Control: The Discrete-time case, Athena Scientific, 1996
Bensoussan, Estimation and Control of Dynamical Systems, Springer 2018
Yong and Zhou Stochastic Controls: Hamiltonian Systems and HJB equations, Springer 1999
Fleming and Soner, Controlled Markov Processes and Viscosity Solutions, Springer 2006

More advanced texts include:

Touzi, Optimal Stochastic Control, Stochastic Target Problems and Backward SDE, Fields Lecture Notes 2010
Krylov, Controlled Diffusion Processes, Springer 1980
Pham, Continuous-time Stochastic Control and Optimization with Financial Applications, Springer 2009

Department of Statistics, University of Oxford

Expand All

This course runs Oct-Dec, but it may be possible to follow the course via pre-recorded videos and prepare an assessment.

Aims and Objectives: Many data come in the form of networks, for example friendship data and protein-protein interaction data. As the data usually cannot be modelled using simple independence assumptions, their statistical analysis provides many challenges. The course will give an introduction to the main problems and the main statistical techniques used in this field. The techniques are applicable to a wide range of complex problems. The statistical analysis benefits from insights which stem from probabilistic modelling, and the course will combine both aspects.

Synopsis:

Exploratory analysis of networks. The need for network summaries. Degree distribution, clustering coefficient, shortest path length. Motifs.

Probabilistic models: Bernoulli random graphs, geometric random graphs, preferential attachment models, small world networks, inhomogeneous random graphs, exponential random graphs.

Small subgraphs: Stein’s method for normal and Poisson approximation. Branching process approximations, threshold behaviour, shortest path between two vertices.

Statistical analysis of networks: Sampling from networks. Parameter estimation for models. Inference from networks: vertex characteristics and missing edges. Nonparametric graph comparison: subgraph counts, subsampling schemes, MCMC methods. A brief look at community detection. 

More detailsSC2 Probability and Statistics for Network Analysis

Reading:

R. Durrett, Random Graph Dynamics, Cambridge University Press,2007

E.D Kolaczyk and G. Csádi, Statistical Analysis of Network Data with R, Springer, 2014

M. Newman, Networks: An Introduction. Oxford University Press, 2010

Recommended Prerequisites:

The course requires a good level of mathematical maturity. Students are expected to be familiar with core concepts in statistics (regression models, bias-variance tradeoff, Bayesian inference), probability (multivariate distributions, conditioning) and linear algebra (matrix-vector operations, eigenvalues and eigenvectors). Previous exposure to machine learning (empirical risk minimisation, dimensionality reduction, overfitting, regularisation) is highly recommended. Students would also benefit from being familiar with the material covered in the following courses offered in the Statistics department: SB2.1 (formerly SB2a) Foundations of Statistical Inference and in SB2.2 (formerly SB2b) Statistical Machine Learning.

Aims and Objectives:

Machine learning is widely used across many scientific and engineering disciplines, to construct methods to find interesting patterns and to predict accurately in large datasets. This course introduces several widely used data machine learning techniques and describes their underpinning statistical principles and properties. The course studies both unsupervised and supervised learning and several advanced topics are covered in detail, including some state-of-the-art machine learning techniques. The course will also cover computational considerations of machine learning algorithms and how they can scale to large datasets.

Synopsis:

Empirical risk minimisation. Loss functions. Generalization. Over- and under-fitting. Bias and variance. Regularisation.

Support vector machines.

Kernel methods and reproducing kernel Hilbert spaces. Representer theorem. Representation of probabilities in RKHS.

Deep learning: Neural networks. Computation graphs. Automatic differentiation. Stochastic gradient descent.

Probabilistic and Bayesian machine learning: Fundamentals of the Bayesian approach. EM algorithm. Variational inference. Latent variable models.

Deep generative models. Variational auto-encoders.

Gaussian processes. Bayesian optimisation.

Software: Knowledge of Python is not required for this course, but some examples may be done in Python. Students interested in learning Python are referred to the following free University IT online course, which should ideally be taken before the beginning of this course: https://skills.it.ox.ac.uk/whats-on#/course/LY046

More details: SC4 Advanced Topics in Statistical Machine Learning

Reading:

C. Bishop, Pattern Recognition and Machine Learning, Springer,2007

K. Murphy, Machine Learning: a Probabilistic Perspective, MIT Press, 2012

Further Reading:

T. Hastie, R. Tibshirani, J Friedman, Elements of Statistical Learning, Springer, 2009

Scikit-learn: Machine Learning in Python, Pedregosa et al., JMLR 12, pp. 2825-2830, 2011, http://scikit-learn.org/stable/tutorial/

Recommended Prerequisites:

The course requires a good level of mathematical maturity as well as some statistical intuition and background knowledge to motivate the course. Students are expected to be familiar with core concepts from probability (conditional probability, conditional densities, properties of conditional expectations, basic inequalities such as Markov's, Chebyshev's and Cauchy-Schwarz’s, modes of convergence), basic limit theorems from probability in particular the strong law of large numbers and the central limit theorem, Markov chains, aperiodicity, irreducibility, stationary distributions, reversibility and convergence. Most of these concepts are covered in courses offered in the Statistics department, in particular prelims probability, A8 probability and SB3.1 (formerly SB3a) Applied Probability. Familiarity with basic Monte Carlo methods will be helpful, as for example covered in A12 Simulation and Statistical Programming. Some familiarity with concepts from Bayesian inference such as posterior distributions will be useful in order to understand the motivation behind the material of the course.

Aims and Objectives:

The aim of the lectures is to introduce modern simulation methods. This course concentrates on Markov chain Monte Carlo (MCMC) methods and Sequential Monte Carlo (SMC) methods. Examples of applications of these methods to complex inference problems will be given.

Synopsis:

Classical methods: inversion, rejection, composition.

Importance sampling.

MCMC methods: elements of discrete-time general state-space Markov chains theory, Metropolis-Hastings algorithm.

Advanced MCMC methods: Gibbs sampling, slice sampling, tempering/annealing, Hamiltonian (or Hybrid) Monte Carlo, pseudo-marginal MCMC. Sequential importance sampling. SMC methods: nonlinear filtering.

Reading:

C.P. Robert and G. Casella, Monte Carlo Statistical Methods, 2nd edition, SpringerVerlag, 2004

Further reading:

J.S. Liu, Monte Carlo Strategies in Scientific Computing, Springer-Verlag, 2001

Department of Computer Science

Expand All

Overview:

Machine learning techniques enable us to automatically extract features from data so as to solve predictive tasks, such as speech recognition, object recognition, machine translation, question-answering, anomaly detection, medical diagnosis and prognosis, automatic algorithm configuration, personalisation, robot control, time series forecasting, and much more. Learning systems adapt so that they can efficiently solve new tasks related to previously encountered tasks.

This course will introduce the field of machine learning, in particular focusing on the core concepts of supervised and unsupervised learning. In supervised learning we will discuss algorithms which are trained on input data labelled with a desired output, for instance an image of a face and the name of the person whose face it is, and learn a function mapping from the input to the output. Unsupervised learning aims to discover latent structure in an input signal where no output labels are available, an example of which is grouping news articles based on the topics they cover. Students will learn algorithms which underpin many popular machine learning techniques, as well as developing an understanding of the theoretical relationships between these algorithms. The practicals will concern the application of machine learning to a range of real-world problems.

Prerequisites:
Machine learning is a mathematical discipline and it requires a good background in linear algebra, calculus, probability and algorithms. If you have not taken the following courses (or their equivalents) you should speak with the lecturers prior to registering for this course.

  • Continuous Mathematics
  • Linear Algebra
  • Probability
  • Design and Analysis of Algorithms

Syllabus:

  • Introduction to different paradigms of machine learning
  • Linear prediction, Regression
  • Maximum Likelihood, MAP, Bayesian ML
  • Regularization, Generalization, Cross Validation
  • Basics of Optimization
  • Linear Classification, Logistic Regression, Naïve Bayes
  • Support Vector Machines
  • Kernel Methods
  • Neural Networks, Backpropagation
  • Convolutional Neural Networks
  • Recurrent Neural Networks
  • Unsupervised Learning, Clustering, k-means
  • Dimensionality Reduction, PCA

Further details: Machine Learning (2023-24)

Reading list:
C. M. Bishop. Pattern Recognition and Machine Learning. Springer 2006.

Ian Goodfellow, Yoshua Bengio and Aaron Courville. Deep Learning. MIT Press 2016.

Kevin P. Murphy. Machine Learning: A Probabilistic Perspective. MIT Press 2012.

Term: Hilary

Overview:
The course will appeal to students who want to gain a better understanding of modern deep learning and will present a systematic geometric blueprint allowing them to derive popular deep neural network architectures (CNNs, GNNs, Transformers, etc) from the first principles of symmetry and invariance. The focus will be on general principles that underpin deep learning as well as concrete examples of their realisations and applications. The course will try to tie together topics in geometry, group theory and representation learning, graph theory, and machine learning into a coherent picture. It ideally targets students in CS & Math cohort or CS students with a strong mathematical background.

Learning outcomes:

  • Understand the theoretical geometric principles of symmetry, invariance, and equivariance underlying modern deep learning architectures
  • Understand various deep neural network architectures (CNNs, GNNs, Transformers, DeepSets, LSTMs) and be able to derive them from first principles
  • Learn different applications of the methods studied in the course and understand problem-specific choices

Further details: Geometric Deep Learning (2023-24)

Term: Hilary

Overview:
This is an advanced course in machine learning, focusing on recent advances in deep learning specifically such as Bayesian neural networks. The course will concentrate on underlying fundamental methodology as well as on applications, such as in autonomous driving, astronomy, and medical applications. Recent statistical techniques based on neural networks have achieved remarkable progress in these domains, leading to a great deal of commercial and academic interest. The course will introduce the mathematical definitions of the relevant machine learning models and derive their associated approximate inference algorithms, demonstrating the models in the various domains. The taught material and assessment include both theoretical derivations as well as applied implementations, and students are expected to be proficient with both.

Learning outcomes:
After studying this course, students will:

  • Understand the definition of a range of deep learning models.
  • Be able to derive and implement approximate inference algorithms for these models.
  • Be able to implement and evaluate common neural network models for vision applications.
  • Have a good understanding of the two numerical approaches to learning (stochastic optimization and MC integration) and how they relate to the Bayesian approach.
  • Have an understanding of how to choose a model to describe a particular type of data.
  • Know how to evaluate a learned model in practice.
  • Understand the mathematics necessary for constructing novel machine learning solutions.
  • Be able to design and implement various machine learning algorithms in a range of real-world applications.

Prerequisites:
Required background knowledge includes probability theory, linear algebra, continuous mathematics, multivariate calculus and multivariate probability theory, as well as good programming skills in Python. Students are required to have taken the Machine Learning course. The programming environment used in the lecture examples and practicals will be Python/PyTorch.

Further details: Uncertainty in Deep Learning (2023-24)

Reading list
Kevin P. Murphy. Probabilistic Machine Learning: Advanced Topics. MIT Press (2022)
https://probml.github.io/pml-book/book2.html 

Kevin P. Murphy. Probabilistic Machine Learning: An Introduction. MIT Press (2022)
https://probml.github.io/pml-book/book1.html 

Ian Goodfellow, Yoshua Bengio and Aaron Courville. Deep Learning. MIT Press 2016
https://www.deeplearningbook.org/