transformer blocks
30 Dec 2022 - 28 Nov 2023
- [[1706.03762] Attention Is All You Need](https://arxiv.org/abs/1706.03762)
- Recurrent networks (including LSTM) are state of the art (in 2017). This proposes flushing them and replacing with nothing but attention mechanisms, specifically the Transformer.
We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely
In these [convolutional] models, the number of operations required to relate signals from two arbitrary input or output positions grows in the distance between positions, linearly for ConvS2S and logarithmically for ByteNet
- would be interesting to know the topology of those networks (I don't).
- Sequence prediction has obvious problems of non-parallizablity, which all these things aim to address somehow.
- self-attention (intra-attention)