Deep Learning Reading Group

A weekly casual reading group to explore recent work in deep learning and related machine learning topics. Upcoming material is to be read before each session, so that it may be discussed in an open format. Staff and students from all backgrounds who are interested in these topics are welcome, as we aim to cover a broad mix of both theoretical and application-focused papers. This semester will focus on transformers; the fundamental architecture behind state-of-the-art image recognition systems and large language models.

Sessions are held in person at Melbourne Connect.

This group is led by Dr Liam Hodgkinson, Lecturer (Data Science).

For direct information, sign up to the dedicated group list. If you have issues with this mailing list form, please email mcds@unimelb.edu.au

Upcoming Discussion Sessions and Readings

DateReadings

17 April 2024

Tues

(Methods) Mamba: Linear-Time Sequence Modeling with Selective State Spaces (https://arxiv.org/abs/2312.00752):

The Transformer architecture is ubiquitous in deep learning, underlying models achieving state-of-the-art performance across almost every machine learning task. However, Transformers have one major limitation at scale: inference quickly becomes infeasible on long sequences. State space models have become a popular alternative to address this issue. This paper presents Mamba, the latest evolution in state space model architectures, which shows highly competitive performance to Transformers.

24 April 2024

Wed

(Theory) An Explanation of In-Context Learning as Implicit Bayesian Inference (https://arxiv.org/abs/2111.02080):

Large language models can perform in-context learning, where the model itself can learn at inference time to accomplish a downstream task specified only using a prompt. This paper provides one explanation for this phenomenon using ideas from Bayesian statistics. See also the blog post: https://www.inference.vc/implicit-bayesian-inference-in-sequence-models/

1 May 2024

Wed

(Methods) Single-Model Uncertainties for Deep Learning (https://arxiv.org/abs/1811.00908):

Quantifying uncertainties for the predictions of deep learning models is a difficult task in general, but remains important for drawing inferences. Here, a simple universal scheme is proposed to augment deep learning models with an additional input that one can adjust to predict arbitrary quantiles.

8 May 2024

Wed

(Theory) Linear attention is (maybe) all you need (to understand transformer optimization) (https://arxiv.org/abs/2310.01082):

From the theoretical point of view, there are several unique properties of transformers that help to distinguish them from other neural network architectures. This paper outlines many of these, and shows that they can be replicated and studied using a (very) basic transformer model.

15 May 2024

Wed

(Methods) Optuna: A Next-generation Hyperparameter Optimization Framework
(https://arxiv.org/abs/1907.10902)

The tuning of hyperparameters in deep learning is often conducted using grid search, i.e. trying a bunch of different hyperparameters and selecting the best performing one. However, there are more principled approaches worth discussing. This software paper outlines a few of these in a more practical context.

22 May 2024

Wed

(Methods) The Forward-Forward Algorithm: Some Preliminary Investigations (https://arxiv.org/abs/2212.13345)

All deep learning architectures today are trained using a gradient-based procedure (backpropagation). This paper discusses a curious and controversial alternative approach for training networks using only forward passes.

29 May 2024

Wed

(Theory) A U-turn on Double Descent: Rethinking Parameter Counting in Statistical Learning (https://arxiv.org/abs/2310.18988):

The conventional statistical wisdom of the bias—variance tradeoff is broken in the deep learning regime, where the “bigger is better” rule of thumb reigns supreme. One theoretical explanation for these heuristics is the double descent phenomenon. This paper provides a recent nuanced take on the nature of the phenomenon and how to approach deep learning theory from the statistical point of view.

Past Deep Learning Discussion Sessions and Readings