Clay Mullis comments

Results 176 comments of


                                            Clay Mullis

Google Colab notebook needs updates

addressed here https://github.com/lucidrains/DALLE-pytorch/pull/320

(colab notebook) Train DALLE-pytorch on C@H

> And btw I'm valteralfred. @afiaka87 Hey! I've seen this bug before I think. You need to delete a folder containing the precompiled pytorch extensions. I want to say it's...

(colab notebook) Train DALLE-pytorch on C@H

Anyone coming here from the notebook - I'm not really on the discord as often as I should be. File issues with the notebook here if you can or I'm...

(colab notebook) Train DALLE-pytorch on C@H

I updated the colab notebook recently to train with the crawling @ home dataset. Hopefully fixed some of these issues.

(colab notebook) Train DALLE-pytorch on C@H

> @afiaka87 Hi, thanks for your sharing. I am using the afiaka dalle generation colab.https://colab.research.google.com/drive/11V2xw1eLPfZvzW8UQyTUhqCEU71w6Pr4?usp=sharing#scrollTo=682c5804-5f97-469f-8cf1-1cc8356591b8. Got several bugs I don't know how to fix: > File "/usr/local/lib/python3.7/dist-packages/deepspeed/ops/sparse_attention/sparse_self_attention.py", line 127, in...

CUDA out of memory in the middle of an epoch (deepspeed --amp)

@robvanvolt I suppose I've had my fair share of unexplained OOM's. Is there anything else specific about the run that you can think of? Is it happening on a consistent...

CUDA out of memory in the middle of an epoch (deepspeed --amp)

BTW do you have a nice curated output from your most recent checkpoint to use for the README.md in the new PR i'm working on? I just checked that checkpoint...

Bump the docker version for sparse attention?

https://github.com/lucidrains/DALLE-pytorch/pull/320

Bump the docker version for sparse attention?

actually works on 1.8.1 LTS CUDA 10.2 CUDNN7

sparse attn setting: ('full', 'sparse') required?

@richcmwang tl;dr Yes! I'm running this currently with a 24 layer DALLE on a 1024 dimension transformer. So the `sparse` flag actually uses a specific type of sparse attention from...