Clay Mullis
Clay Mullis
@lucidrains https://github.com/lucidrains/DALLE-pytorch/pull/193 If you can work from this then go for it - if you have a better implementation in mind let me know.
Here's a sample tokenizer to work with - perhaps include if you think it's a good idea `wget https://www.dropbox.com/s/uie7is0dyuxqmk0/hg_bpe_cc12m.json` Not a permanent host fyi. @lucidrains
> > Here's a sample tokenizer to work with - perhaps include if you think it's a good idea > > > > `wget https://www.dropbox.com/s/uie7is0dyuxqmk0/hg_bpe_cc12m.json` > > > > Not...
Looking at `train_dalle.py` provides some insights from @janEbert prior grokking of Deep Speed. First mistake I'm making here is loading the checkpoint like this: ```sh dalle.load_state_dict(weights) ``` which is apparently...
Okay - I did things the way they're meant to be done (i believe) @rom1504 @janEbert @mehdidc ``` if args.fp16: engine = deepspeed.init_inference(dalle, dtype=torch.half) engine = deepspeed.init_inference(dalle) # training for...
As always, apologies to Jan who I'm sure has already explained this issue ;) I'll admit to some amount of laziness with regard to doing the due diligence on all...
Thanks @richcmwang! I'll work on this later unless you wanna make the PR. @rom1504 The DeepSpeed docs do indeed claim faster inference with the inference engine. Not sure how though.
This does of course mean you won't be able to upload DeepSpeed checkpoints to W&B - which I guess is a bug in its own right. I personally would want...
The VQGAN simply won't work in 16-bit precision unfortunately. Converting only the torch modules of dalle which aren't VQGAN, and then forcing autocasting to fp32 for the vqgan mitigates this...
> Wow amazing! Is that really enough to make it work ? I've been missing that feature a lot while using deepspeed please test it! but i think so yes.