DALLE2-pytorch Open replication of the generator

trafficstars

So it turns out people also are interested to work on the generator. Let's use this issue to track progress on that.

What's needed:

[x] A dataloader that uses eg webdataset containing both .jpg and text embedding as .npy
[x] A training loop working on one node
[x] A first training on a small dataset (for example use img2dataset on cc3m or a small subset of laion2B)
[x] analyse results
[x] scale up the training code to multi node

This will require a lot of work but should be very cool It will work best in conjunction with #23 but can still be built beforehand (by using directly text embeddings instead of mapped image embeddings)

Apr 27 '22 20:04 rom1504

Notes for this task can be kept in this doc (https://docs.google.com/document/d/1DkFY9ZUqXHKJGlX87g0S85-VULgeb0q-gTRbz_Lr9JU/edit?usp=sharing).

Apr 27 '22 20:04 Veldrovive

@rom1504 what is the .not file extension?

Apr 27 '22 23:04 lucidrains

the generator is actually interesting because it consists of multiple networks (cascading unets), and they can be trained separately (from the base network all the way to the super-resoluting one at the very end)

Apr 27 '22 23:04 lucidrains

if doing latent diffusion training, also worth thinking about whether to pre-encode the codebook ids and then select the codes from the codebook during training

Apr 27 '22 23:04 lucidrains

@rom1504 what is the .not file extension?

Ah that's a typo, i meant npy. Numpy saving format.

Apr 28 '22 00:04 rom1504

Interesting points about the networks that can be trained separately. Will check that out!

Apr 28 '22 00:04 rom1504

@rom1504 , hi, can you point me to where I can download such dataset, I'm also trying to achieve the same thing.

Apr 28 '22 23:04 xiankgx

@xiankgx I think he refers to https://laion.ai/laion-5b-a-new-era-of-open-large-scale-multi-modal-datasets/

Apr 29 '22 08:04 godofdream

Pull request #57 introduced a dataloader that can read a webdataset and embeddings.

May 05 '22 14:05 Veldrovive

I believe we got almost everything now. Time to scale up

Jun 19 '22 19:06 rom1504

@lucidrains The memory efficient UNet is quite nice.

We're still in early testing, but anecdotally we were able to increase our batch size over 3x(fp16 & no text conditioning).

Jun 24 '22 17:06 nousr

Things are almost done now. Last steps:

[ ] fit a 3B model in 40GB of vram #192
[ ] train on the right dataset for 800k steps
[ ] evaluate and release

Jul 08 '22 21:07 rom1504

164995465-79f1e6e6-5aba-415f-918d-e349ac5119cd

hp params

Jul 08 '22 21:07 rom1504

DALLE2-pytorch DALLE2-pytorch copied to clipboard

Open replication of the generator

DALLE2-pytorch
DALLE2-pytorch copied to clipboard