dalle-2-preview icon indicating copy to clipboard operation
dalle-2-preview copied to clipboard

How many parameters?

Open LifeIsStrange opened this issue 2 years ago • 4 comments

Sorry to ask, but DALL-E v1 has 12 billions parameters, however it is unclear how many parameters has DALL-E v2. I'm also wondering wether inference can be run on a single 3090 ti GPU or in other words, will consummers be able to use it on realistic hardware? If not then you should consider leveraging https://github.com/microsoft/DeepSpeed

LifeIsStrange avatar Apr 13 '22 01:04 LifeIsStrange

I don't know how many parameters it has, but there is no way it can run on a 3090ti, it only has 24GB of VRAM. Maybe maybe maybe maybe a100 with 80GB can

orenong avatar Apr 14 '22 02:04 orenong

@orenong I was wondering how much could deepspeed lower the VRAM usage. Also, RAM can be compressed https://en.m.wikipedia.org/wiki/Zswap can the same be achieved for VRAM?

LifeIsStrange avatar Apr 14 '22 12:04 LifeIsStrange

According to the paper, the decoder is 3.5 billion parameters (Appendix C, table 3). It's 1.2B for the text model, and 2.3 for the vision model - which is not too bad actually. Then it seems like they also have 2 upsamplers (64^2-> 256^2 and 256^2-> 1024^2), which are both fewer parameters. The 64->256 one was 700M, and the 256->1024 was 300M. They also had two different models for the clip embedding prior, each about a billion.

I think if they do release it, you might actually be able to run it on a 3090 if you run each model once at a time and do a lot of other tricks to reduce RAM use.

tcl9876 avatar Apr 16 '22 01:04 tcl9876

Atleast this model fits in A100 with very less effort. If it were massive model like GPT3 or PaLM, doing research with it would have become next to impossible.

According to the paper, the decoder is 3.5 billion parameters (Appendix C, table 3). It's 1.2B for the text model, and 2.3 for the vision model - which is not too bad actually. Then it seems like they also have 2 upsamplers (64^2-> 256^2 and 256^2-> 1024^2), which are both fewer parameters. The 64->256 one was 700M, and the 256->1024 was 300M. They also had two different models for the clip embedding prior, each about a billion.

I think if they do release it, you might actually be able to run it on a 3090 if you run each model once at a time and do a lot of other tricks to reduce RAM use.

INF800 avatar Apr 19 '22 03:04 INF800