lucidrains/transframer-pytorch: Implementation of Transframer, Deepmind's U-net +...

Transframer - Pytorch (wip)

Implementation of Transframer, Deepmind's U-net + Transformer architecture for up to 30 seconds video generation, in Pytorch

The gist of the paper is the usage of a Unet as a multi-frame encoder, along with a regular transformer decoder cross attending and predicting the rest of the frames. The author builds upon his prior work where images are encoded as sparse discrete cosine transform (DCT) sequences.

I will deviate from the implementation in this paper, using a hierarchical autoregressive transformer, and just a regular resnet block in place of the NF-net block (this design choice is just Deepmind reusing their own code, as NF-net was developed at Deepmind by Brock et al).

Update: On further meditation, there is nothing new in this paper except for generative modeling on DCT representations

Appreciation

This work would not be possible without the generous sponsorship from Stability AI, as well as my other sponsors

Todo

[ ] figure out if dct can be directly extracted from images in jpeg format

Citations

@article{Nash2022TransframerAF,
    title   = {Transframer: Arbitrary Frame Prediction with Generative Models},
    author  = {Charlie Nash and Jo{\~a}o Carreira and Jacob Walker and Iain Barr and Andrew Jaegle and Mateusz Malinowski and Peter W. Battaglia},
    journal = {ArXiv},
    year    = {2022},
    volume  = {abs/2203.09494}
}

transframer-pytorch
transframer-pytorch copied to clipboard

Metadata

Transframer - Pytorch (wip)

Appreciation

Todo

Citations

← Metadata

Owner

Metadata

transframer-pytorch transframer-pytorch copied to clipboard

Metadata

Transframer - Pytorch (wip)

Appreciation

Todo

Citations

← Metadata

Owner

Metadata

transframer-pytorch
transframer-pytorch copied to clipboard