Tejas Kulkarni
Tejas Kulkarni
@lucidrains i was using a different trainer but if i use your's and this below code then it seems more reasonable (only 1k iters). The moving mnist is from: `https://www.cs.toronto.edu/~nitish/unsupervised_video/`...
> very interesting! can give this a shot today. didn't expect convnext to not work as well. Another interesting and somewhat related point --the transframer (https://arxiv.org/pdf/2203.09494.pdf) paper used NF-ResNet block...
>  > > @mrkulk cool! was not aware of transframer - will need to queue that up for reading > > second day of training, it has yet again...
re ddp_sharded -- great to know and I will check it out regarding use case -- yes the models are too big to fit on a single GPU and also...
target network is necessary for learning.
@mkkellogg happy to help/fund development of this feature!
@mkkellogg I think this is a great idea and will have wide implications. Video models like openAI sora are getting very good and this feature might become a critical piece...