DALLE-pytorch icon indicating copy to clipboard operation
DALLE-pytorch copied to clipboard

Share my installation of DeepSpeed

Open PKULiuHui opened this issue 3 years ago • 3 comments

I can't install deepspeed using the instructions provided by the author. So I spent a lot of time before successfully installing deepspeed and triton, and train dalle with "--attn_type full,sparse". I share my experience below so that someone facing the same problem may save time.

  1. install triton-0.4.0 pip install triton==0.4.0 By default, deepspeed require triton-0.2.3, which I can't successfully install on my server. So I install triton-0.4.0.

  2. download the deepspeed that support latest triton source code and change into this directory.

  3. edit the requirements file vi requirements/requirements-sparse_attn.txt change the content to 'triton==0.4.0'

  4. install deepspeed with sparse attention DS_BUILD_SPARSE_ATTN=1 pip install .

  5. check deepspeed with ds_report, it will show screen_cut

Note: you may need to install llvm-9 using sudo apt-get -y install llvm-9-dev cmake. My server OS is Ubuntu LTS 16.04 and install llvm-9 is troublesome. I just use llvm and update my gcc version to 6.5.0. It worked as well.

Finally, I can train dalle with sparse attention. Hope it can help you. 捕获

PKULiuHui avatar May 17 '21 13:05 PKULiuHui

I can't install deepspeed using the instructions provided by the author. So I spent a lot of time before successfully installing deepspeed and triton, and train dalle with "--attn_type full,sparse". I share my experience below so that someone facing the same problem may save time.

1. install triton-0.4.0
   `pip install triton==0.4.0`
   By default, deepspeed require triton-0.2.3, which I can't successfully install on my server. So I install triton-0.4.0.

2. download the [deepspeed that support latest triton](https://github.com/microsoft/DeepSpeed/tree/sparse-attn/support-latest-triton) source code and change into this directory.

3. edit the requirements file
   `vi requirements/requirements-sparse_attn.txt`
   change the content to 'triton==0.4.0'

4. install deepspeed with sparse attention
   `DS_BUILD_SPARSE_ATTN=1 pip install .`

5. check deepspeed with `ds_report`, it will show
   ![screen_cut](https://user-images.githubusercontent.com/32560313/118494657-fcb83a80-b754-11eb-9f45-fdaa8ee1851c.PNG)

Note: you may need to install llvm-9 using sudo apt-get -y install llvm-9-dev cmake. My server OS is Ubuntu LTS 16.04 and install llvm-9 is troublesome. I just use llvm and update my gcc version to 6.5.0. It worked as well.

Finally, I can train dalle with sparse attention. Hope it can help you. 捕获

Thank you so much for figuring this out! We have so many issues with deepspeed. It's worth mentioninng to anyone else who may find these instructions useful - this will (unfortunately) break theDeepSpeed ZeRO configuration for using cpu-based Adam, etc. Shouldn't really be a problem on single GPU setups though.

afiaka87 avatar May 17 '21 17:05 afiaka87

Unfortunately, the installation does not work with the latest Nvidia GPUS (30XX), and triton==1.0.0.dev20210329 got permanently deleted (https://github.com/ptillet/triton/issues/99)...

edit.: a598fba0 (HEAD) [DOCS] Various improvements and typo fixes seems to work (triton branch 1.0.0)

https://github.com/lucidrains/DALLE-pytorch/wiki/Deepspeed---Installation#for-the-latest-nvidia-gpus-3090-3080-3070-3060-rtx-try-the-following

robvanvolt avatar May 22 '21 13:05 robvanvolt

The Triton wheel has been updated. I think pip install triton==0.4.1 should work now.

ptillet avatar May 26 '21 00:05 ptillet