DALLE-pytorch
DALLE-pytorch copied to clipboard
Share my installation of DeepSpeed
I can't install deepspeed using the instructions provided by the author. So I spent a lot of time before successfully installing deepspeed and triton, and train dalle with "--attn_type full,sparse". I share my experience below so that someone facing the same problem may save time.
-
install triton-0.4.0
pip install triton==0.4.0
By default, deepspeed require triton-0.2.3, which I can't successfully install on my server. So I install triton-0.4.0. -
download the deepspeed that support latest triton source code and change into this directory.
-
edit the requirements file
vi requirements/requirements-sparse_attn.txt
change the content to 'triton==0.4.0' -
install deepspeed with sparse attention
DS_BUILD_SPARSE_ATTN=1 pip install .
-
check deepspeed with
ds_report
, it will show
Note: you may need to install llvm-9 using sudo apt-get -y install llvm-9-dev cmake
. My server OS is Ubuntu LTS 16.04 and install llvm-9 is troublesome. I just use llvm and update my gcc version to 6.5.0. It worked as well.
Finally, I can train dalle with sparse attention. Hope it can help you.
I can't install deepspeed using the instructions provided by the author. So I spent a lot of time before successfully installing deepspeed and triton, and train dalle with "--attn_type full,sparse". I share my experience below so that someone facing the same problem may save time.
1. install triton-0.4.0 `pip install triton==0.4.0` By default, deepspeed require triton-0.2.3, which I can't successfully install on my server. So I install triton-0.4.0. 2. download the [deepspeed that support latest triton](https://github.com/microsoft/DeepSpeed/tree/sparse-attn/support-latest-triton) source code and change into this directory. 3. edit the requirements file `vi requirements/requirements-sparse_attn.txt` change the content to 'triton==0.4.0' 4. install deepspeed with sparse attention `DS_BUILD_SPARSE_ATTN=1 pip install .` 5. check deepspeed with `ds_report`, it will show data:image/s3,"s3://crabby-images/f6a0c/f6a0cc23315dc14f6a8fe72b115b1aeaf0811e2a" alt="screen_cut"
Note: you may need to install llvm-9 using
sudo apt-get -y install llvm-9-dev cmake
. My server OS is Ubuntu LTS 16.04 and install llvm-9 is troublesome. I just use llvm and update my gcc version to 6.5.0. It worked as well.Finally, I can train dalle with sparse attention. Hope it can help you.
Thank you so much for figuring this out! We have so many issues with deepspeed. It's worth mentioninng to anyone else who may find these instructions useful - this will (unfortunately) break theDeepSpeed ZeRO configuration for using cpu-based Adam, etc. Shouldn't really be a problem on single GPU setups though.
Unfortunately, the installation does not work with the latest Nvidia GPUS (30XX), and triton==1.0.0.dev20210329 got permanently deleted (https://github.com/ptillet/triton/issues/99)...
edit.: a598fba0 (HEAD) [DOCS] Various improvements and typo fixes seems to work (triton branch 1.0.0)
https://github.com/lucidrains/DALLE-pytorch/wiki/Deepspeed---Installation#for-the-latest-nvidia-gpus-3090-3080-3070-3060-rtx-try-the-following
The Triton wheel has been updated. I think pip install triton==0.4.1
should work now.