Sie sind hier: Startseite Studium und Lehre Sommersemester 2020 Vorlesung Mathematics of Deep …

Information on the Lecture: Mathematics of Deep Learning (SS 2020)

General information

  • Lecturer: JProf. Dr. Philipp Harms, Room 244, Ernst-Zermelo-Straße 1,
  • Assistant: Jakob Stiefel
  • Short videos and slides: available on ILIAS every Tuesday night
  • Discussion and further reading: Wednesdays 14:15-14:45 in our virtual meeting room. 
  • Exercises: Instruction sheets available on ILIAS. Solutions to be handed in every 2nd Wednesday on ILIAS. Discussion of solutions every 2nd Friday at 10:15 in our virtual meeting room.
  • Virtual meeting room: Zoom meeting 916 6576 1668 or, as a backup option, BigBlueButton meeting vHarms (passwords available on ILIAS). 

Instructional Development Award

  • This lecture is part of an initiative led by Philipp Harms, Frank Hutter, and Thorsten Schmidt to develop a modular and interdisciplinary teaching concept in the areas of machine learning and deep learning.
  • This initiative is supported by the University of Freiburg in the form an Instructional Development Award


  • Statistical learning theory: generalization and approximation error, bias-variance decomposition
  • Universal approximation theorems: density of shallow neural networks in various function spaces
  • Nonlinear approximation theory: dictionary learning and transfer-of-approximation results
  • Hilbert's 13th problem and Kolmogorov-Arnold representation: some caveats about the choice of activation function
  • Harmonic analysis: lower bounds on network approximation rates via affine systems of Banach frames
  • Information theory: upper bounds on network approximation rates via binary encoders and decoders
  • ReLU networks and the role of network depth: exponential as opposed to polynomial approximation rates 

Slides and Videos

All lectures: slides

  1. Deep learning as statistical learning slides video
    1. Motivation for Deep Learning  video 
    2. Introduction to Statistical Learning video 
    3. Empirical risk minimization and related algorithms video 
    4. Error decompositions video 
    5. Error trade-offs video 
    6. Error bounds video 
    7. Organizational Issues video 
    8. Wrapup video
  2. Neural networks slides video 
    1. Multilayer Perceptrons video 
    2. A Brief History of Deep Learning video 
    3. Deep Learning as Representation Learning video 
    4. Definition of Neural Networks video 
    5. Operations on Neural Networks video 
    6. Universality of Neural Networks video 
    7. Discriminatory Activation Functions video 
    8. Wrapup video
  3. Dictionary learning slides video
    1. Introduction to Dictionary Learning video 
    2. Approximating Hölder Functions by Splines video 
    3. Approximating Univariate Splines by Multi-Layer Perceptrons video 
    4. Approximating Products by Multi-Layer Perceptrons video 
    5. Approximating Multivariate Splines by Multi-Layer Perceptrons video 
    6. Approximating Hölder Functions by Multi-Layer Perceptrons video 
    7. Wrapup video
  4. Kolmogorov-Arnold representation: slides video
    1. Hilbert’s 13th Problem video 
    2. Kolmogorov–Arnold Representation  video 
    3. Approximate Hashing for Specific Functions video 
    4. Approximate Hashing for Generic Functions video 
    5. Proof of the Kolmogorov–Arnold Theorem video 
    6. Approximation by Networks of Bounded Size video 
    7. Wrapup video 
  5. Harmonic analysis: slides video
    1. Banach frames video
    2. Group representations video
    3. Signal representations video
    4. Regular Coorbit Spaces video
    5. Duals of Coorbit Spaces video
    6. General Coorbit Spaces video
    7. Discretization video
    8. Wrapup video
  6. Signal analysis: slides video
    1. Coorbit Theory, Signal Analysis, and Deep Learning video 
    2. Heisenberg Group video
    3. Modulation Spaces video
    4. Affine Group video
    5. Wavelet Spaces video
    6. Shearlet Group video
    7. Shearlet Coorbit Spaces video
    8. Wrapup video
  7. Sparse data representation: slides video
    1. Rate-Distortion Theory video
    2. Hypercube Embeddings and Ball Coverings video
    3. Dictionaries as Encoders video
    4. Frames as Dictionaries video
    5. Networks as Encoders video
    6. Dictionaries as Networks video
    7. Wrapup video
  8. ReLU networks and the role of depth: slides video 
    1. Operations on ReLU Networks video
    2. ReLU Representation of Saw-Tooth Functions video
    3. Saw-Tooth Approximation of the Square Function video
    4. ReLU Approximation of Multiplication video
    5. ReLU Approximation of Analytic Functions video
    6. Wrapup video


Courses on deep learning

  • Frank Hutter and Joschka Boedecker (Department of Computer Science, University of Freiburg): Foundations of Deep Learning. ILIAS
  • Philipp C. Petersen (University of Vienna): Neural Network Theory. pdf

Effectiveness of deep learning

  • Sejnowski (2020): The unreasonable effectiveness of deep learning in artificial intelligence
  • Donoho (2000): High-Dimensional Data Analysis—the Curses and Blessings of Dimensionality

Statistical learning theory

  • Bousquet, Boucheron, and Lugosi (2003): Introduction to statistical learning theory.
  • Vapnik (1999): An overview of statistical learning theory.

Universal approximation theorems

  • Hornik (1989): Multilayer Feedforward Networks are Universal Approximators
  • Cybenko (1989): Approximation by superpositions of a sigmoidal function
  • Hornik (1991): Approximation capabilities of multilayer feedforward networks

Nonlinear approximation theory

  • Oswald (1990): On the degree of nonlinear spline approximation in Besov-Sobolev spaces
  • DeVore (1998): Nonlinear approximation

Hilbert's 13th problem and Kolmogorov-Arnold representation

  • Arnold (1958): On the representation of functions of several variables
  • Torbjörn Hedberg: The Kolmogorov Superposition Theorem. In Shapiro (1971): Topics in Approximation Theory
  • Bar-Natan (2009): Hilberts 13th problem, in full color
  • Hecht-Nielsen (1987): Kolmogorov’s mapping neural network existence theorem

Harmonic analysis

  • Christensen (2016): An introduction to frames and Riesz bases
  • Dahlke, De Mari, Grohs, Labatte (2015): Harmonic and Applied Analysis
  • Feichtinger Gröchenig (1988): A unified approach to atomic decompositions
  • Gröchenig (2001): Foundations of Time-Frequency Analysis
  • Mallat (2009): A Wavelet Tour of Signal Processing
  • Kutyniok and Labate (2012): Shearlets - Multiscale Analysis for Multivariate Data

Information theory

  • Bölcskei, Grohs, Kutyniok, Petersen (2017): Optimal approximation with sparsely connected deep neural networks. In: SIAM Journal on Mathematics of Data Science 1.1, pp. 8–45
  • Dahlke, De Mari, Grohs, Labatte (2015): Harmonic and Applied Analysis. Birkhäuser.
  • Donoho (2001): Sparse Components of Images and Optimal Atomic Decompositions. In: Constructive Approximation 17, pp. 353–382
  • Shannon (1959): Coding Theorems for a Discrete Source with a Fidelity Criterion. In: International Convention Record 7, pp. 325–350

ReLU networks and the role of depth

  • Perekrestenko, Grohs, Elbrächter, Bölcskei (2018): The universal approximation power of finite-width deep ReLU Networks. arXiv:1806.01528
  • E, Wang (2018): Exponential convergence of the deep neural approximation for analytic functions. arXiv:1807.00297
  • Yarotsky (2017): Error bounds for approximations with deep ReLU networks. Neural Networks 94, pp. 103–114.