Deep Learning Reading Group
This is ninth session of 2024. We'll be discussing:
(Theory) Linear attention is (maybe) all you need (to understand transformer optimisation) (https://arxiv.org/abs/2310.01082):
- From the theoretical point of view, there are several unique properties of transformers that help to distinguish them from other neural network architectures. This paper outlines many of these, and shows that they can be replicated and studied using a (very) basic transformer model.
Upcoming material is to be read before each session, so that it may be discussed in an open format. Staff and students from all backgrounds who are interested in these topics are welcome. If you are interested but haven't been to a session yet, come along. There is no need to have participated in all sessions.