DALLE2-pytorch
DALLE2-pytorch copied to clipboard
Open replication of the generator
So it turns out people also are interested to work on the generator. Let's use this issue to track progress on that.
What's needed:
- [x] A dataloader that uses eg webdataset containing both .jpg and text embedding as .npy
- [x] A training loop working on one node
- [x] A first training on a small dataset (for example use img2dataset on cc3m or a small subset of laion2B)
- [x] analyse results
- [x] scale up the training code to multi node
This will require a lot of work but should be very cool It will work best in conjunction with #23 but can still be built beforehand (by using directly text embeddings instead of mapped image embeddings)
Notes for this task can be kept in this doc (https://docs.google.com/document/d/1DkFY9ZUqXHKJGlX87g0S85-VULgeb0q-gTRbz_Lr9JU/edit?usp=sharing).
@rom1504 what is the .not
file extension?
the generator is actually interesting because it consists of multiple networks (cascading unets), and they can be trained separately (from the base network all the way to the super-resoluting one at the very end)
if doing latent diffusion training, also worth thinking about whether to pre-encode the codebook ids and then select the codes from the codebook during training
@rom1504 what is the
.not
file extension?
Ah that's a typo, i meant npy. Numpy saving format.
Interesting points about the networks that can be trained separately. Will check that out!
@rom1504 , hi, can you point me to where I can download such dataset, I'm also trying to achieve the same thing.
@xiankgx I think he refers to https://laion.ai/laion-5b-a-new-era-of-open-large-scale-multi-modal-datasets/
Pull request #57 introduced a dataloader that can read a webdataset and embeddings.
I believe we got almost everything now. Time to scale up
@lucidrains The memory efficient UNet is quite nice.
We're still in early testing, but anecdotally we were able to increase our batch size over 3x(fp16 & no text conditioning).
Things are almost done now. Last steps:
- [ ] fit a 3B model in 40GB of vram #192
- [ ] train on the right dataset for 800k steps
- [ ] evaluate and release
hp params