DALLE2-pytorch icon indicating copy to clipboard operation
DALLE2-pytorch copied to clipboard

Open replication of the generator

Open rom1504 opened this issue 2 years ago • 13 comments

So it turns out people also are interested to work on the generator. Let's use this issue to track progress on that.

What's needed:

  • [x] A dataloader that uses eg webdataset containing both .jpg and text embedding as .npy
  • [x] A training loop working on one node
  • [x] A first training on a small dataset (for example use img2dataset on cc3m or a small subset of laion2B)
  • [x] analyse results
  • [x] scale up the training code to multi node

This will require a lot of work but should be very cool It will work best in conjunction with #23 but can still be built beforehand (by using directly text embeddings instead of mapped image embeddings)

rom1504 avatar Apr 27 '22 20:04 rom1504

Notes for this task can be kept in this doc (https://docs.google.com/document/d/1DkFY9ZUqXHKJGlX87g0S85-VULgeb0q-gTRbz_Lr9JU/edit?usp=sharing).

Veldrovive avatar Apr 27 '22 20:04 Veldrovive

@rom1504 what is the .not file extension?

lucidrains avatar Apr 27 '22 23:04 lucidrains

the generator is actually interesting because it consists of multiple networks (cascading unets), and they can be trained separately (from the base network all the way to the super-resoluting one at the very end)

lucidrains avatar Apr 27 '22 23:04 lucidrains

if doing latent diffusion training, also worth thinking about whether to pre-encode the codebook ids and then select the codes from the codebook during training

lucidrains avatar Apr 27 '22 23:04 lucidrains

@rom1504 what is the .not file extension?

Ah that's a typo, i meant npy. Numpy saving format.

rom1504 avatar Apr 28 '22 00:04 rom1504

Interesting points about the networks that can be trained separately. Will check that out!

rom1504 avatar Apr 28 '22 00:04 rom1504

@rom1504 , hi, can you point me to where I can download such dataset, I'm also trying to achieve the same thing.

xiankgx avatar Apr 28 '22 23:04 xiankgx

@xiankgx I think he refers to https://laion.ai/laion-5b-a-new-era-of-open-large-scale-multi-modal-datasets/

godofdream avatar Apr 29 '22 08:04 godofdream

Pull request #57 introduced a dataloader that can read a webdataset and embeddings.

Veldrovive avatar May 05 '22 14:05 Veldrovive

I believe we got almost everything now. Time to scale up

rom1504 avatar Jun 19 '22 19:06 rom1504

@lucidrains The memory efficient UNet is quite nice.

We're still in early testing, but anecdotally we were able to increase our batch size over 3x(fp16 & no text conditioning).

nousr avatar Jun 24 '22 17:06 nousr

Things are almost done now. Last steps:

  • [ ] fit a 3B model in 40GB of vram #192
  • [ ] train on the right dataset for 800k steps
  • [ ] evaluate and release

rom1504 avatar Jul 08 '22 21:07 rom1504

164995465-79f1e6e6-5aba-415f-918d-e349ac5119cd

hp params

rom1504 avatar Jul 08 '22 21:07 rom1504