DeepSpeed Failed to install Fused

Hello, I am struggling to download fused_adam pre build of deepspeed. I found nothing that solve my problem. Here are the situations.

DS_BUILD_FUSED_ADAM=1 pip install deepspeed
ds_report

still have same results.

How can I successfully download and utilized fused_adam?

[2024-05-21 11:34:41,285] [WARNING] [real_accelerator.py:162:get_accelerator] Setting accelerator to CPU. If you have GPU or other accelerator, we were unable to detect it.
[2024-05-21 11:34:41,287] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cpu (auto detect)
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
deepspeed_not_implemented  [NO] ....... [OKAY]
deepspeed_ccl_comm ..... [NO] ....... [OKAY]
deepspeed_shm_comm ..... [NO] ....... [OKAY]
cpu_adam ............... [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['TORCH_INSTALL_PATH']
torch version .................... 2.1.2+cu121
deepspeed install path ........... ['DEEPSPEED_INSTALL_PATH']
deepspeed info ................... 0.14.2+cu118torch2.0, unknown, unknown
deepspeed wheel compiled w. ...... torch 2.0
shared memory (/dev/shm) size .... 125.67 GB

Thanks for reading my question!

+) not only fused adam, but also every build does not work

DS_BUILD_OPS=1 pip install deepspeed

May 21 '24 02:05 daehuikim

Hi @daehuikim - are you able to run pip install deepspeed with no errors? And do you hit any errors when installing other ops?

It appears that your system is being detected as a CPU, but you have installed torch+cuda, can you tell us more about what accelerator you are trying to use?

[2024-05-21 11:34:41,285] [WARNING] [real_accelerator.py:162:get_accelerator] Setting accelerator to CPU. If you have GPU or other accelerator, we were unable to detect it.

May 21 '24 20:05 loadams

Hello @loadams Thanks for your replying. pip install deepspeed works without any errors for me. I am running my script on my master node which has CPU only using slurm scheduler. Specifically, I am activating conda virtual environment that has packages and propagate works to worker node which has multiple GPUs using slurm scheduler. Therefore, I am trying to install deepspeed with ops in my conda virtual environment.

May 22 '24 00:05 daehuikim

I see, is there a reason that you need to precompile the ops? Since you should be able to run DeepSpeed on the GPU nodes and it will detect the GPU and then JIT compile the ops (information here.)

May 22 '24 15:05 loadams

@loadams There is no reason for doing this. I was just following this tutorial about finetuning t5 model. I found another way to utilize fused adm just adding torch_adam=true in optimizers in deepspeed config now. I just wanted to let contributors know this(failing pre-build installation in some environment) happens. Thanks for replying!

May 23 '24 04:05 daehuikim

Thanks @daehuikim - that makes sense, since it currently believes your environment is a CPU environment on your master node, so it believes that it can only run certain ops that are installed. Can you try running with the following (this may not work since you don't have cuda installed on the node, but if you do, you can specify the type of DeepSpeed accelerator to build for with the DS_ACCELERATOR=cuda env var added before your pip install command?

May 23 '24 15:05 loadams

pip uninstall deepspeed
DS_ACCELERATOR=cuda pip install deepspeed
ds_report

produces the result like below

[2024-05-24 09:17:18,762] [WARNING] [real_accelerator.py:162:get_accelerator] Setting accelerator to CPU. If you have GPU or other accelerator, we were unable to detect it.
[2024-05-24 09:17:18,763] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cpu (auto detect)
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
deepspeed_not_implemented  [NO] ....... [OKAY]
deepspeed_ccl_comm ..... [NO] ....... [OKAY]
deepspeed_shm_comm ..... [NO] ....... [OKAY]
cpu_adam ............... [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['TORCH_INSTALL_PATH']
torch version .................... 2.1.2+cu121
deepspeed install path ........... ['DEEPSPEED_INSTALL_PATH']
deepspeed info ................... 0.14.2+cu118torch2.0, unknown, unknown
deepspeed wheel compiled w. ...... torch 2.0
shared memory (/dev/shm) size .... 125.67 GB

@loadams I tried recommended variable and got same result.

May 24 '24 00:05 daehuikim

Hi @daehuikim I use the following command and can see cuda op status. Note I don't have CUDA toolchain installed. Is your environment have CUDA toolchain you should be able to see desired result on your master node.

DS_ACCELERATOR=cuda DS_BUILD_FUSED_ADAM=1 pip install deepspeed
DS_ACCELERATOR=cuda ds_report

You may want to set in your .bashrc if you wish to build CUDA by default on master node. You don't need this env on compute node but it will work as well.

(dscpu) 22:07:19|~/machine_learning/DeepSpeed$ DS_ACCELERATOR=cuda ds_report
[2024-05-27 22:07:36,330] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (override)
 [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.
 [WARNING]  async_io: please install the libaio-dev package with apt
 [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
 [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
 [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.1
 [WARNING]  using untested triton version (2.1.0), only 1.0.0 is known to be compatible
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
 [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.
 [WARNING]  async_io: please install the libaio-dev package with apt
 [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [NO] ....... [NO]
fused_adam ............. [NO] ....... [OKAY]
cpu_adam ............... [NO] ....... [OKAY]
cpu_adagrad ............ [NO] ....... [OKAY]
cpu_lion ............... [NO] ....... [OKAY]
 [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
evoformer_attn ......... [NO] ....... [NO]
fp_quantizer ........... [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
fused_lion ............. [NO] ....... [OKAY]
inference_core_ops ..... [NO] ....... [OKAY]
cutlass_ops ............ [NO] ....... [OKAY]
transformer_inference .. [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
ragged_device_ops ...... [NO] ....... [OKAY]
ragged_ops ............. [NO] ....... [OKAY]
random_ltd ............. [NO] ....... [OKAY]
 [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.1
 [WARNING]  using untested triton version (2.1.0), only 1.0.0 is known to be compatible
sparse_attn ............ [NO] ....... [NO]
spatial_inference ...... [NO] ....... [OKAY]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/home/akey/anaconda3/envs/dscpu/lib/python3.10/site-packages/torch']
torch version .................... 2.1.0+cu121
deepspeed install path ........... ['/home/akey/anaconda3/envs/dscpu/lib/python3.10/site-packages/deepspeed']
deepspeed info ................... 0.14.2, unknown, unknown
torch cuda version ............... 12.1
torch hip version ................ None
nvcc version .....................  [FAIL] cannot find CUDA_HOME via torch.utils.cpp_extension.CUDA_HOME=None 
deepspeed wheel compiled w. ...... torch 2.1, cuda 12.1
shared memory (/dev/shm) size .... 31.18 GB

May 27 '24 14:05 delock

DS_ACCELERATOR=cuda ds_report

@delock Your recommendation made everything perfect! Thanks for giving nice advice on it! I got same results with you Thanks again :)

May 27 '24 14:05 daehuikim

Thanks for clarifying the env var use, @delock!

May 28 '24 16:05 loadams

Failed to install Fused_adam op on CPU