transframer-pytorch
transframer-pytorch copied to clipboard
Implementation of Transframer, Deepmind's U-net + Transformer architecture for up to 30 seconds video generation, in Pytorch
Transframer - Pytorch (wip)
Implementation of Transframer, Deepmind's U-net + Transformer architecture for up to 30 seconds video generation, in Pytorch
The gist of the paper is the usage of a Unet as a multi-frame encoder, along with a regular transformer decoder cross attending and predicting the rest of the frames. The author builds upon his prior work where images are encoded as sparse discrete cosine transform (DCT) sequences.
I will deviate from the implementation in this paper, using a hierarchical autoregressive transformer, and just a regular resnet block in place of the NF-net block (this design choice is just Deepmind reusing their own code, as NF-net was developed at Deepmind by Brock et al).
Update: On further meditation, there is nothing new in this paper except for generative modeling on DCT representations
Appreciation
- This work would not be possible without the generous sponsorship from Stability AI, as well as my other sponsors
Todo
- [ ] figure out if dct can be directly extracted from images in jpeg format
Citations
@article{Nash2022TransframerAF,
title = {Transframer: Arbitrary Frame Prediction with Generative Models},
author = {Charlie Nash and Jo{\~a}o Carreira and Jacob Walker and Iain Barr and Andrew Jaegle and Mateusz Malinowski and Peter W. Battaglia},
journal = {ArXiv},
year = {2022},
volume = {abs/2203.09494}
}