DeepSpeed-MII icon indicating copy to clipboard operation
DeepSpeed-MII copied to clipboard

"error: cuda_runtime_api.h: No such file or directory"

Open auwsom opened this issue 1 year ago • 11 comments

Hello, I'm trying to run the basic example. I have several LLMs working and have used Huggingface Hub to download them, for reference. However, I get this error in the title. Indeed this file is not found in: /home/user/.local/lib/python3.10/site-packages/torch/include/c10/I did find it here: /usr/local/cuda-11.7/targets/x86_64-linux/include/cuda_runtime_api.h

I had a challenging time getting my nvidia driver to work with the right cuda version during torch install. Current PyTorch version is: Version: 1.12.1+cu116. You can see the version 11.7 in the above path. I'm not sure how relevant that is, but this is the only combination of cuda and torch versions I could get working. I think c10 denotes the default version of torch installed with python 3.10 on Ubuntu 22.04. Which is supported by this quote from SE:

"PyTorch doesn't use the system's CUDA library. When you install PyTorch using the precompiled binaries using either pip or conda it is shipped with a copy of the specified version of the CUDA library which is installed locally."

The output does say: Installed CUDA version 11.7 does not match the version torch was compiled with 11.6 but since the APIs are compatible, accepting this combination Using /home/user/.cache/torch_extensions/py310_cu116 as PyTorch extensions root...

Do I need to set some environment vars and/or install another version of PyTorch in a virtualenv? I'm a little short on space, so hopping not. It seems there is some conflict between the default PyTorch c10 locations and the discovered 11.6/11.7 version of Cuda.

Quick side note: the models downloaded to /tmp/mii_models. Is it possible to use the standard Huggingface model locations?

auwsom avatar Oct 24 '22 22:10 auwsom

Hi @auwsom sorry to see that you're having trouble running MII. The output about mismatched CUDA versions shouldn't be an issue. It's only a warning and I'm currently running with the same setup (CUDA11.7 on the machine, pytorch compiled with 11.6). Could you please verify that your torch setup is working with CUDA?

import torch
print(torch.__version__)
print(torch.cuda.is_available())

Also, if you can include the entire output, so we can see when this error happens that would help us debug the issue.

As for the model download location, you can specify where you want to download the models with the model_path parameter in mii.deploy(). See the source code here. If you want to use the HF cache (assuming you have the latest version of transformers): model_path="~/.cache/huggingface/hub"

mrwyattii avatar Oct 25 '22 22:10 mrwyattii

Hi @mrwyattii, thanks for the reply. Thanks for the tip on the Hub usage.

Here is the python output.

>>> print(torch.__version__)
1.12.1+cu116
>>> print(torch.cuda.is_available())
True

Here is the entire error message. The output is through Jupyter. I put double spaces to separate the stdout from the sterr 2 sections of each, alternating.

Error - click this ``` [2022-10-24 15:07:31,460] [INFO] [deployment.py:85:deploy] ************* MII is using DeepSpeed Optimizations to accelerate your model ************* [2022-10-24 15:07:31,481] [INFO] [server_client.py:217:_initialize_service] MII using multi-gpu deepspeed launcher: ------------------------------------------------------------ task-name .................... text-generation model ........................ bigscience/bloom-560m model-path ................... /tmp/mii_models port ......................... 50050 provider ..................... hugging-face ------------------------------------------------------------ [2022-10-24 15:07:34,203] [WARNING] [runner.py:179:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only. [2022-10-24 15:07:34,221] [INFO] [runner.py:507:main] cmd = /usr/bin/python3 -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMF19 --master_addr=127.0.0.1 --master_port=29500 --no_python --no_local_rank /usr/bin/python3 -m mii.launch.multi_gpu_server --task-name text-generation --model bigscience/bloom-560m --model-path /tmp/mii_models --port 50050 --ds-optimize --provider hugging-face --config eyJ0ZW5zb3JfcGFyYWxsZWwiOiAxLCAicG9ydF9udW1iZXIiOiA1MDA1MCwgImR0eXBlIjogImZwMTYiLCAiZW5hYmxlX2N1ZGFfZ3JhcGgiOiBmYWxzZSwgImNoZWNrcG9pbnRfZGljdCI6IG51bGwsICJkZXBsb3lfcmFuayI6IFswXSwgInRvcmNoX2Rpc3RfcG9ydCI6IDI5NTAwLCAiaGZfYXV0aF90b2tlbiI6IG51bGwsICJyZXBsYWNlX3dpdGhfa2VybmVsX2luamVjdCI6IHRydWUsICJwcm9maWxlX21vZGVsX3RpbWUiOiBmYWxzZX0= [2022-10-24 15:07:36,491] [INFO] [server_client.py:115:_wait_until_server_is_live] waiting for server to start... [2022-10-24 15:07:36,592] [INFO] [launch.py:136:main] WORLD INFO DICT: {'localhost': [0]} [2022-10-24 15:07:36,592] [INFO] [launch.py:142:main] nnodes=1, num_local_procs=1, node_rank=0 [2022-10-24 15:07:36,592] [INFO] [launch.py:155:main] global_rank_mapping=defaultdict(, {'localhost': [0]}) [2022-10-24 15:07:36,593] [INFO] [launch.py:156:main] dist_world_size=1 [2022-10-24 15:07:36,593] [INFO] [launch.py:158:main] Setting CUDA_VISIBLE_DEVICES=0 [2022-10-24 15:07:41,498] [INFO] [server_client.py:115:_wait_until_server_is_live] waiting for server to start... [2022-10-24 15:07:46,504] [INFO] [server_client.py:115:_wait_until_server_is_live] waiting for server to start... [2022-10-24 15:07:51,508] [INFO] [server_client.py:115:_wait_until_server_is_live] waiting for server to start... [2022-10-24 15:07:56,515] [INFO] [server_client.py:115:_wait_until_server_is_live] waiting for server to start... [2022-10-24 15:08:01,521] [INFO] [server_client.py:115:_wait_until_server_is_live] waiting for server to start... [2022-10-24 15:08:06,529] [INFO] [server_client.py:115:_wait_until_server_is_live] waiting for server to start... > --------- MII Settings: ds_optimize=True, replace_with_kernel_inject=True, enable_cuda_graph=False [2022-10-24 15:08:10,989] [INFO] [logging.py:68:log_dist] [Rank -1] DeepSpeed info: version=0.7.4, git-hash=unknown, git-branch=unknown [2022-10-24 15:08:10,989] [INFO] [logging.py:68:log_dist] [Rank -1] quantize_bits = 8 mlp_extra_grouping = False, quantize_groups = 1 [2022-10-24 15:08:11,535] [INFO] [server_client.py:115:_wait_until_server_is_live] waiting for server to start... Installed CUDA version 11.7 does not match the version torch was compiled with 11.6 but since the APIs are compatible, accepting this combination Using /home/user/.cache/torch_extensions/py310_cu116 as PyTorch extensions root... Detected CUDA files, patching ldflags Emitting ninja build file /home/user/.cache/torch_extensions/py310_cu116/transformer_inference/build.ninja... Building extension module transformer_inference... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [1/9] c++ -MMD -MF pt_binding.o.d -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/user/.local/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/home/user/.local/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -isystem /home/user/.local/lib/python3.10/site-packages/torch/include -isystem /home/user/.local/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/user/.local/lib/python3.10/site-packages/torch/include/TH -isystem /home/user/.local/lib/python3.10/site-packages/torch/include/THC -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -O3 -std=c++14 -g -Wno-reorder -c /home/user/.local/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp -o pt_binding.o FAILED: pt_binding.o c++ -MMD -MF pt_binding.o.d -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/user/.local/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/home/user/.local/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -isystem /home/user/.local/lib/python3.10/site-packages/torch/include -isystem /home/user/.local/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/user/.local/lib/python3.10/site-packages/torch/include/TH -isystem /home/user/.local/lib/python3.10/site-packages/torch/include/THC -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -O3 -std=c++14 -g -Wno-reorder -c /home/user/.local/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp -o pt_binding.o In file included from /home/user/.local/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/pt_binding.cpp:5: /home/user/.local/lib/python3.10/site-packages/torch/include/c10/cuda/CUDAStream.h:6:10: fatal error: cuda_runtime_api.h: No such file or directory 6 | #include | ^~~~~~~~~~~~~~~~~~~~ compilation terminated. [2/9] /usr/bin/nvcc -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/user/.local/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/home/user/.local/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -isystem /home/user/.local/lib/python3.10/site-packages/torch/include -isystem /home/user/.local/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/user/.local/lib/python3.10/site-packages/torch/include/TH -isystem /home/user/.local/lib/python3.10/site-packages/torch/include/THC -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_86,code=compute_86 -c /home/user/.local/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/transform.cu -o transform.cuda.o FAILED: transform.cuda.o /usr/bin/nvcc -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/user/.local/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/home/user/.local/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -isystem /home/user/.local/lib/python3.10/site-packages/torch/include -isystem /home/user/.local/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/user/.local/lib/python3.10/site-packages/torch/include/TH -isystem /home/user/.local/lib/python3.10/site-packages/torch/include/THC -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_86,code=compute_86 -c /home/user/.local/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/transform.cu -o transform.cuda.o : fatal error: cuda_runtime.h: No such file or directory compilation terminated. [3/9] /usr/bin/nvcc -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/user/.local/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/home/user/.local/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -isystem /home/user/.local/lib/python3.10/site-packages/torch/include -isystem /home/user/.local/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/user/.local/lib/python3.10/site-packages/torch/include/TH -isystem /home/user/.local/lib/python3.10/site-packages/torch/include/THC -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_86,code=compute_86 -c /home/user/.local/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/relu.cu -o relu.cuda.o FAILED: relu.cuda.o /usr/bin/nvcc -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/user/.local/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/home/user/.local/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -isystem /home/user/.local/lib/python3.10/site-packages/torch/include -isystem /home/user/.local/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/user/.local/lib/python3.10/site-packages/torch/include/TH -isystem /home/user/.local/lib/python3.10/site-packages/torch/include/THC -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_86,code=compute_86 -c /home/user/.local/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/relu.cu -o relu.cuda.o : fatal error: cuda_runtime.h: No such file or directory compilation terminated. [4/9] /usr/bin/nvcc -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/user/.local/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/home/user/.local/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -isystem /home/user/.local/lib/python3.10/site-packages/torch/include -isystem /home/user/.local/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/user/.local/lib/python3.10/site-packages/torch/include/TH -isystem /home/user/.local/lib/python3.10/site-packages/torch/include/THC -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_86,code=compute_86 -c /home/user/.local/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/softmax.cu -o softmax.cuda.o FAILED: softmax.cuda.o /usr/bin/nvcc -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/user/.local/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/home/user/.local/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -isystem /home/user/.local/lib/python3.10/site-packages/torch/include -isystem /home/user/.local/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/user/.local/lib/python3.10/site-packages/torch/include/TH -isystem /home/user/.local/lib/python3.10/site-packages/torch/include/THC -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_86,code=compute_86 -c /home/user/.local/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/softmax.cu -o softmax.cuda.o : fatal error: cuda_runtime.h: No such file or directory compilation terminated. [5/9] /usr/bin/nvcc -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/user/.local/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/home/user/.local/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -isystem /home/user/.local/lib/python3.10/site-packages/torch/include -isystem /home/user/.local/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/user/.local/lib/python3.10/site-packages/torch/include/TH -isystem /home/user/.local/lib/python3.10/site-packages/torch/include/THC -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_86,code=compute_86 -c /home/user/.local/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.cu -o apply_rotary_pos_emb.cuda.o FAILED: apply_rotary_pos_emb.cuda.o /usr/bin/nvcc -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/user/.local/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/home/user/.local/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -isystem /home/user/.local/lib/python3.10/site-packages/torch/include -isystem /home/user/.local/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/user/.local/lib/python3.10/site-packages/torch/include/TH -isystem /home/user/.local/lib/python3.10/site-packages/torch/include/THC -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_86,code=compute_86 -c /home/user/.local/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/apply_rotary_pos_emb.cu -o apply_rotary_pos_emb.cuda.o : fatal error: cuda_runtime.h: No such file or directory compilation terminated. [6/9] /usr/bin/nvcc -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/user/.local/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/home/user/.local/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -isystem /home/user/.local/lib/python3.10/site-packages/torch/include -isystem /home/user/.local/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/user/.local/lib/python3.10/site-packages/torch/include/TH -isystem /home/user/.local/lib/python3.10/site-packages/torch/include/THC -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_86,code=compute_86 -c /home/user/.local/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/dequantize.cu -o dequantize.cuda.o FAILED: dequantize.cuda.o /usr/bin/nvcc -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/user/.local/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/home/user/.local/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -isystem /home/user/.local/lib/python3.10/site-packages/torch/include -isystem /home/user/.local/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/user/.local/lib/python3.10/site-packages/torch/include/TH -isystem /home/user/.local/lib/python3.10/site-packages/torch/include/THC -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_86,code=compute_86 -c /home/user/.local/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/dequantize.cu -o dequantize.cuda.o : fatal error: cuda_runtime.h: No such file or directory compilation terminated. [7/9] /usr/bin/nvcc -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/user/.local/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/home/user/.local/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -isystem /home/user/.local/lib/python3.10/site-packages/torch/include -isystem /home/user/.local/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/user/.local/lib/python3.10/site-packages/torch/include/TH -isystem /home/user/.local/lib/python3.10/site-packages/torch/include/THC -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_86,code=compute_86 -c /home/user/.local/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/gelu.cu -o gelu.cuda.o FAILED: gelu.cuda.o /usr/bin/nvcc -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/user/.local/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/home/user/.local/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -isystem /home/user/.local/lib/python3.10/site-packages/torch/include -isystem /home/user/.local/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/user/.local/lib/python3.10/site-packages/torch/include/TH -isystem /home/user/.local/lib/python3.10/site-packages/torch/include/THC -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_86,code=compute_86 -c /home/user/.local/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/gelu.cu -o gelu.cuda.o : fatal error: cuda_runtime.h: No such file or directory compilation terminated. [8/9] /usr/bin/nvcc -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/user/.local/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/home/user/.local/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -isystem /home/user/.local/lib/python3.10/site-packages/torch/include -isystem /home/user/.local/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/user/.local/lib/python3.10/site-packages/torch/include/TH -isystem /home/user/.local/lib/python3.10/site-packages/torch/include/THC -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_86,code=compute_86 -c /home/user/.local/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/normalize.cu -o normalize.cuda.o FAILED: normalize.cuda.o /usr/bin/nvcc -DTORCH_EXTENSION_NAME=transformer_inference -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/user/.local/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/includes -I/home/user/.local/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -isystem /home/user/.local/lib/python3.10/site-packages/torch/include -isystem /home/user/.local/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/user/.local/lib/python3.10/site-packages/torch/include/TH -isystem /home/user/.local/lib/python3.10/site-packages/torch/include/THC -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_86,code=compute_86 -c /home/user/.local/lib/python3.10/site-packages/deepspeed/ops/csrc/transformer/inference/csrc/normalize.cu -o normalize.cuda.o : fatal error: cuda_runtime.h: No such file or directory compilation terminated. ninja: build stopped: subcommand failed.

Traceback (most recent call last): File "/home/user/.local/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1808, in _run_ninja_build subprocess.run( File "/usr/lib/python3.10/subprocess.py", line 524, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/home/user/dev/DeepSpeed-MII/mii/launch/multi_gpu_server.py", line 70, in main() File "/home/user/dev/DeepSpeed-MII/mii/launch/multi_gpu_server.py", line 56, in main inference_pipeline = load_models(task_name=args.task_name, File "/home/user/dev/DeepSpeed-MII/mii/models/load_models.py", line 73, in load_models engine = deepspeed.init_inference(getattr(inference_pipeline, File "/home/user/.local/lib/python3.10/site-packages/deepspeed/init.py", line 305, in init_inference engine = InferenceEngine(model, File "/home/user/.local/lib/python3.10/site-packages/deepspeed/inference/engine.py", line 149, in init self._apply_injection_policy( File "/home/user/.local/lib/python3.10/site-packages/deepspeed/inference/engine.py", line 367, in _apply_injection_policy replace_transformer_layer( File "/home/user/.local/lib/python3.10/site-packages/deepspeed/module_inject/replace_module.py", line 930, in replace_transformer_layer replaced_module = replace_module(model=model, File "/home/user/.local/lib/python3.10/site-packages/deepspeed/module_inject/replace_module.py", line 1187, in replace_module replaced_module, _ = _replace_module(model, policy) File "/home/user/.local/lib/python3.10/site-packages/deepspeed/module_inject/replace_module.py", line 1214, in _replace_module _, layer_id = _replace_module(child, policies, layer_id=layer_id) File "/home/user/.local/lib/python3.10/site-packages/deepspeed/module_inject/replace_module.py", line 1214, in _replace_module _, layer_id = _replace_module(child, policies, layer_id=layer_id) File "/home/user/.local/lib/python3.10/site-packages/deepspeed/module_inject/replace_module.py", line 1204, in _replace_module replaced_module = policies[child.class][0](child, File "/home/user/.local/lib/python3.10/site-packages/deepspeed/module_inject/replace_module.py", line 920, in replace_fn new_module = replace_with_policy(child, File "/home/user/.local/lib/python3.10/site-packages/deepspeed/module_inject/replace_module.py", line 505, in replace_with_policy new_module = transformer_inference.DeepSpeedTransformerInference( File "/home/user/.local/lib/python3.10/site-packages/deepspeed/ops/transformer/inference/transformer_inference.py", line 777, in init inference_cuda_module = builder.load() File "/home/user/.local/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py", line 459, in load return self.jit_load(verbose) File "/home/user/.local/lib/python3.10/site-packages/deepspeed/ops/op_builder/builder.py", line 494, in jit_load op_module = load( File "/home/user/.local/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1202, in load return _jit_compile( File "/home/user/.local/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1425, in _jit_compile _write_ninja_file_and_build_library( File "/home/user/.local/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1537, in _write_ninja_file_and_build_library _run_ninja_build( File "/home/user/.local/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1824, in _run_ninja_build raise RuntimeError(message) from e RuntimeError: Error building extension 'transformer_inference'

[2022-10-24 15:08:15,639] [INFO] [launch.py:286:sigkill_handler] Killing subprocess 2649619 [2022-10-24 15:08:15,640] [ERROR] [launch.py:292:sigkill_handler] ['/usr/bin/python3', '-m', 'mii.launch.multi_gpu_server', '--task-name', 'text-generation', '--model', 'bigscience/bloom-560m', '--model-path', '/tmp/mii_models', '--port', '50050', '--ds-optimize', '--provider', 'hugging-face', '--config', 'eyJ0ZW5zb3JfcGFyYWxsZWwiOiAxLCAicG9ydF9udW1iZXIiOiA1MDA1MCwgImR0eXBlIjogImZwMTYiLCAiZW5hYmxlX2N1ZGFfZ3JhcGgiOiBmYWxzZSwgImNoZWNrcG9pbnRfZGljdCI6IG51bGwsICJkZXBsb3lfcmFuayI6IFswXSwgInRvcmNoX2Rpc3RfcG9ydCI6IDI5NTAwLCAiaGZfYXV0aF90b2tlbiI6IG51bGwsICJyZXBsYWNlX3dpdGhfa2VybmVsX2luamVjdCI6IHRydWUsICJwcm9maWxlX21vZGVsX3RpbWUiOiBmYWxzZX0='] exits with return code = 1 [2022-10-24 15:08:16,542] [INFO] [server_client.py:115:_wait_until_server_is_live] waiting for server to start...


RuntimeError Traceback (most recent call last) Input In [8], in <cell line: 3>() 1 import mii 2 mii_configs = {"tensor_parallel": 1, "dtype": "fp16"} ----> 3 mii.deploy(task="text-generation", 4 model="bigscience/bloom-560m", 5 deployment_name="bloom560m_deployment", 6 mii_config=mii_configs)

File ~/dev/DeepSpeed-MII/mii/deployment.py:112, in deploy(task, model, deployment_name, deployment_type, model_path, enable_deepspeed, enable_zero, ds_config, mii_config, version) 110 _deploy_aml(deployment_name=deployment_name, model_name=model, version=version) 111 elif deployment_type == DeploymentType.LOCAL: --> 112 return _deploy_local(deployment_name, model_path=model_path) 113 else: 114 raise Exception(f"Unknown deployment type: {deployment_type}")

File ~/dev/DeepSpeed-MII/mii/deployment.py:118, in _deploy_local(deployment_name, model_path) 117 def _deploy_local(deployment_name, model_path): --> 118 mii.utils.import_score_file(deployment_name).init()

File /tmp/mii_cache/bloom560m_deployment/score.py:29, in init() 26 assert task is not None, "The task name should be set before calling init" 28 global model ---> 29 model = mii.MIIServerClient(task, 30 model_name, 31 model_path, 32 ds_optimize=configs[mii.constants.ENABLE_DEEPSPEED_KEY], 33 ds_zero=configs[mii.constants.ENABLE_DEEPSPEED_ZERO_KEY], 34 ds_config=configs[mii.constants.DEEPSPEED_CONFIG_KEY], 35 mii_configs=configs[mii.constants.MII_CONFIGS_KEY], 36 use_grpc_server=use_grpc_server, 37 initialize_grpc_client=initialize_grpc_client)

File ~/dev/DeepSpeed-MII/mii/server_client.py:90, in MIIServerClient.init(self, task_name, model_name, model_path, ds_optimize, ds_zero, ds_config, mii_configs, initialize_service, initialize_grpc_client, use_grpc_server) 83 self.process = self._initialize_service(model_name, 84 model_path, 85 ds_optimize, 86 ds_zero, 87 ds_config, 88 mii_configs) 89 if self.use_grpc_server: ---> 90 self._wait_until_server_is_live() 92 if self.initialize_grpc_client and self.use_grpc_server: 93 self.stubs = []

File ~/dev/DeepSpeed-MII/mii/server_client.py:113, in MIIServerClient._wait_until_server_is_live(self) 111 process_alive = self._is_server_process_alive() 112 if not process_alive: --> 113 raise RuntimeError("server crashed for some reason, unable to proceed") 114 time.sleep(4) 115 logger.info("waiting for server to start...")

RuntimeError: server crashed for some reason, unable to proceed

</details>

auwsom avatar Oct 25 '22 23:10 auwsom

@auwsom The error is originating from DeepSpeed when trying to JIT compile the inference kernels. Can you share the output of ds_report?

Also, let's try pre-compiling the kernels. You can do that by running: DS_BUILD_TRANSFORMER_INFERENCE=1 pip install deepspeed

mrwyattii avatar Oct 25 '22 23:10 mrwyattii

@mrwyattii

find -iname report didnt find ds_report.py. is it somewhere not in the repo?

rerunning pip with that pre-compile var gives the same error FAILED: relu.cuda.o

(Also, it may be handy to put the model_path huggingface hub line in the example, even if commented out. And also the import mii so people could just run the example.)

auwsom avatar Oct 26 '22 01:10 auwsom

the only mention of ds_report is in this file: https://github.com/microsoft/DeepSpeed-MII/blob/16ea809285d60e6e28ca8ab773cd63b2183565fd/.github/workflows/nv-torch-latest-v100.yaml

i see its in: https://github.com/microsoft/DeepSpeed/tree/master/bin

auwsom avatar Oct 26 '22 01:10 auwsom

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [NO] ....... [OKAY]
cpu_adagrad ............ [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
 [WARNING]  please install triton==1.0.0 if you want to use sparse attention
sparse_attn ............ [NO] ....... [NO]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
 [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.
 [WARNING]  async_io: please install the libaio-dev package with apt
 [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [NO] ....... [NO]
utils .................. [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
transformer_inference .. [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/home/user/.local/lib/python3.10/site-packages/torch']
torch version .................... 1.12.1+cu116
torch cuda version ............... 11.6
torch hip version ................ None
nvcc version ..................... 11.7
deepspeed install path ........... ['/home/user/.local/lib/python3.10/site-packages/deepspeed']
deepspeed info ................... 0.7.4, unknown, unknown
deepspeed wheel compiled w. ...... torch 1.12, cuda 11.6

auwsom avatar Oct 26 '22 01:10 auwsom

The ds_report script is installed when DeepSpeed is installed. It looks like your DeepSpeed install is ok and your system appears compatible with the inference kernels used with the bloom model. Can you please try pre-compiling those kernels with DS_BUILD_TRANSFORMER_INFERENCE=1 pip install deepspeed?

mrwyattii avatar Oct 26 '22 18:10 mrwyattii

@mrwyattii interesting, I didn't originally install DeepSpeed, only DeepSpeed-MII. I cloned the repo yesteday just to run ../bin/ds_report. But I also tried your precompile var DS_BUILD_TRANSFORMER_INFERENCE=1 pip install deepspeed which installed DeepSpeed in my venv, I just checked. And the "same error" reply above was from after doing that.

auwsom avatar Oct 26 '22 19:10 auwsom

DeepSpeed is a dependency for DeepSpeed-MII and pip will install it when you do pip install . in the MII repo or pip install deepspeed-mii

I'm a bit confused about your setup as you were able to pre-compile the kernels when you installed deepspeed, so you shouldn't be running into the error you saw previously. I think there may be an issue with your environment. You mention you are using a venv - where is that venv? I see multiple paths: /home/user/.local/lib/python3.10/ and /usr/bin/python3 in the error message, which make me believe MII is picking up the wrong python and/or python packages. If you can, start with a fresh environment:

python3 -m virtualenv test_venv
source ./test_venv/bin/activate
pip3 install --pre torch torchvision --extra-index-url https://download.pytorch.org/whl/nightly/cu117
pip3 install deepspeed
pip3 install deepspeed-mii

note that this will install pytorch compiled with CUDA 11.7, so the previous warning you were seeing should go away

Let me know if you get the same error doing this. Thanks!

mrwyattii avatar Oct 27 '22 00:10 mrwyattii

@mrwyattii thanks. I dont have enough space currently to install another torch (already have 2 installed, one from python default and the 11.6 for my nvidia drivers).

I the meanwhile here is my pip list inside my venv. It is basically the same as system with --system-packages, then with DeepSpeed-MII installed for testing.

Package                 Version         Editable project location                      Location                                       Installer
----------------------- --------------- ---------------------------------------------- ---------------------------------------------- ---------
2to3                    1.0                                                            /home/user/.local/lib/python3.10/site-packages pip
absl-py                 1.2.0                                                          /home/user/.local/lib/python3.10/site-packages pip
accelerate              0.12.0                                                         /home/user/.local/lib/python3.10/site-packages pip
aim                     3.13.3                                                         /home/user/.local/lib/python3.10/site-packages pip
aim-ui                  3.13.3                                                         /home/user/.local/lib/python3.10/site-packages pip
aimrecords              0.0.7                                                          /home/user/.local/lib/python3.10/site-packages pip
aimrocks                0.2.1                                                          /home/user/.local/lib/python3.10/site-packages pip
aiofiles                22.1.0                                                         /home/user/.local/lib/python3.10/site-packages pip
aiohttp                 3.8.1                                                          /home/user/.local/lib/python3.10/site-packages pip
aiohttp-jinja2          1.5                                                            /home/user/.local/lib/python3.10/site-packages pip
aiosignal               1.2.0                                                          /home/user/.local/lib/python3.10/site-packages pip
alembic                 1.8.1                                                          /home/user/.local/lib/python3.10/site-packages pip
analytics-python        1.4.0                                                          /home/user/.local/lib/python3.10/site-packages pip
antlr4-python3-runtime  4.8                                                            /home/user/.local/lib/python3.10/site-packages pip
anyio                   3.5.0                                                          /home/user/.local/lib/python3.10/site-packages pip
apt-xapian-index        0.49                                                           /usr/lib/python3/dist-packages
argcomplete             2.0.0                                                          /home/user/.local/lib/python3.10/site-packages pip
argh                    0.26.2                                                         /home/user/.local/lib/python3.10/site-packages pip
argon2-cffi             21.3.0                                                         /home/user/.local/lib/python3.10/site-packages pip
argon2-cffi-bindings    21.2.0                                                         /home/user/.local/lib/python3.10/site-packages pip
asttokens               2.0.5                                                          /home/user/.local/lib/python3.10/site-packages pip
async-timeout           4.0.2                                                          /home/user/.local/lib/python3.10/site-packages pip
attrs                   21.2.0                                                         /usr/lib/python3/dist-packages
Automat                 20.2.0                                                         /usr/lib/python3/dist-packages
azure-core              1.25.0                                                         /home/user/.local/lib/python3.10/site-packages pip
azure-storage-blob      12.13.1                                                        /home/user/.local/lib/python3.10/site-packages pip
Babel                   2.8.0                                                          /usr/lib/python3/dist-packages
backcall                0.2.0                                                          /home/user/.local/lib/python3.10/site-packages pip
backoff                 1.10.0                                                         /home/user/.local/lib/python3.10/site-packages pip
base58                  2.0.1                                                          /home/user/.local/lib/python3.10/site-packages pip
bcrypt                  3.2.0                                                          /usr/lib/python3/dist-packages
beautifulsoup4          4.11.1                                                         /home/user/.local/lib/python3.10/site-packages pip
better-profanity        0.7.0                                                          /home/user/.local/lib/python3.10/site-packages pip
biopython               1.79                                                           /home/user/.local/lib/python3.10/site-packages pip
bitsandbytes            0.34.0                                                         /home/user/.local/lib/python3.10/site-packages pip
black                   22.3.0                                                         /home/user/.local/lib/python3.10/site-packages pip
bleach                  5.0.0                                                          /home/user/.local/lib/python3.10/site-packages pip
blinker                 1.4                                                            /usr/lib/python3/dist-packages
boto3                   1.24.52                                                        /home/user/.local/lib/python3.10/site-packages pip
botocore                1.27.52                                                        /home/user/.local/lib/python3.10/site-packages pip
bx-python               0.8.13                                                         /home/user/.local/lib/python3.10/site-packages pip
cachetools              5.2.0                                                          /home/user/.local/lib/python3.10/site-packages pip
certifi                 2022.9.24                                                      /home/user/.local/lib/python3.10/site-packages pip
cffi                    1.15.0                                                         /home/user/.local/lib/python3.10/site-packages pip
cfgv                    3.3.1                                                          /home/user/.local/lib/python3.10/site-packages pip
chardet                 4.0.0                                                          /usr/lib/python3/dist-packages
charset-normalizer      2.1.1                                                          /home/user/.local/lib/python3.10/site-packages pip
click                   8.0.4                                                          /home/user/.local/lib/python3.10/site-packages pip
cloud-init              22.2                                                           /usr/lib/python3/dist-packages
cmake                   3.24.1                                                         /home/user/.local/lib/python3.10/site-packages pip
colorama                0.4.4                                                          /usr/lib/python3/dist-packages
command-not-found       0.3                                                            /usr/lib/python3/dist-packages
configobj               5.0.6                                                          /usr/lib/python3/dist-packages
constantly              15.1.0                                                         /usr/lib/python3/dist-packages
cryptography            3.4.8                                                          /usr/lib/python3/dist-packages
cupshelpers             1.0                                                            /usr/lib/python3/dist-packages
cycler                  0.11.0                                                         /home/user/.local/lib/python3.10/site-packages pip
Cython                  0.29.28                                                        /home/user/.local/lib/python3.10/site-packages pip
datasets                1.16.1                                                         /home/user/.local/lib/python3.10/site-packages pip
dbus-python             1.2.18                                                         /usr/lib/python3/dist-packages
debugpy                 1.6.0                                                          /home/user/.local/lib/python3.10/site-packages pip
decorator               5.1.1                                                          /home/user/.local/lib/python3.10/site-packages pip
deepspeed               0.7.4                                                          /home/user/.local/lib/python3.10/site-packages pip
defusedxml              0.7.1                                                          /home/user/.local/lib/python3.10/site-packages pip
dill                    0.3.5.1                                                        /home/user/.local/lib/python3.10/site-packages pip
distlib                 0.3.5                                                          /home/user/.local/lib/python3.10/site-packages pip
distro                  1.7.0                                                          /usr/lib/python3/dist-packages
distro-info             1.1build1                                                      /usr/lib/python3/dist-packages
ecdsa                   0.17.0                                                         /home/user/.local/lib/python3.10/site-packages pip
editdistance            0.6.0                                                          /home/user/.local/lib/python3.10/site-packages pip
elograf                 0.3.2                                                          /usr/local/lib/python3.10/dist-packages
emoji                   2.1.0                                                          /home/user/.local/lib/python3.10/site-packages pip
entrypoints             0.4                                                            /usr/lib/python3/dist-packages                 flit
et-xmlfile              1.1.0                                                          /home/user/.local/lib/python3.10/site-packages pip
evdev                   1.6.0                                                          /home/user/.local/lib/python3.10/site-packages pip
executing               0.8.3                                                          /home/user/.local/lib/python3.10/site-packages pip
fairscale               0.4.8                                                          /home/user/.local/lib/python3.10/site-packages pip
fastapi                 0.85.1                                                         /home/user/.local/lib/python3.10/site-packages pip
fastjsonschema          2.15.3                                                         /home/user/.local/lib/python3.10/site-packages pip
ffmpy                   0.3.0                                                          /home/user/.local/lib/python3.10/site-packages pip
fido2                   0.9.3                                                          /home/user/.local/lib/python3.10/site-packages pip
filelock                3.8.0                                                          /home/user/.local/lib/python3.10/site-packages pip
fire                    0.4.0                                                          /home/user/.local/lib/python3.10/site-packages pip
fisher                  0.1.10                                                         /home/user/.local/lib/python3.10/site-packages pip
Flask                   2.1.1                                                          /home/user/.local/lib/python3.10/site-packages pip
fonttools               4.35.0                                                         /home/user/.local/lib/python3.10/site-packages pip
frozenlist              1.3.0                                                          /home/user/.local/lib/python3.10/site-packages pip
fsspec                  2022.8.2                                                       /home/user/.local/lib/python3.10/site-packages pip
fuse-python             1.0.2                                                          /usr/lib/python3/dist-packages
gffutils                0.11.0                                                         /home/user/.local/lib/python3.10/site-packages pip
google-auth             2.10.0                                                         /home/user/.local/lib/python3.10/site-packages pip
google-auth-oauthlib    0.4.6                                                          /home/user/.local/lib/python3.10/site-packages pip
gpg                     1.16.0-unknown                                                 /usr/lib/python3/dist-packages
gradio                  3.4                                                            /home/user/.local/lib/python3.10/site-packages pip
greenlet                1.1.3                                                          /home/user/.local/lib/python3.10/site-packages pip
grpcio                  1.47.0                                                         /home/user/.local/lib/python3.10/site-packages pip
gyp                     0.1                                                            /usr/lib/python3/dist-packages
h11                     0.12.0                                                         /home/user/.local/lib/python3.10/site-packages pip
helpers                 0.2.0                                                          /home/user/.local/lib/python3.10/site-packages pip
hjson                   3.1.0                                                          /home/user/.local/lib/python3.10/site-packages pip
httpcore                0.15.0                                                         /home/user/.local/lib/python3.10/site-packages pip
httplib2                0.20.2                                                         /usr/lib/python3/dist-packages
httpx                   0.23.0                                                         /home/user/.local/lib/python3.10/site-packages pip
huggingface-hub         0.10.0                                                         /home/user/.local/lib/python3.10/site-packages pip
hydra-core              1.1.2                                                          /home/user/.local/lib/python3.10/site-packages pip
hyperlink               21.0.0                                                         /usr/lib/python3/dist-packages
identify                2.5.3                                                          /home/user/.local/lib/python3.10/site-packages pip
idna                    3.4                                                            /home/user/.local/lib/python3.10/site-packages pip
importlib-metadata      4.6.4                                                          /usr/lib/python3/dist-packages
incremental             21.3.0                                                         /usr/lib/python3/dist-packages
iniconfig               1.1.1                                                          /home/user/.local/lib/python3.10/site-packages pip
intelhex                2.3.0                                                          /home/user/.local/lib/python3.10/site-packages pip
iopath                  0.1.10                                                         /home/user/.local/lib/python3.10/site-packages pip
ipdb                    0.13.9                                                         /home/user/.local/lib/python3.10/site-packages pip
ipykernel               6.13.0                                                         /home/user/.local/lib/python3.10/site-packages pip
ipython                 8.3.0                                                          /home/user/.local/lib/python3.10/site-packages pip
ipython-genutils        0.2.0                                                          /home/user/.local/lib/python3.10/site-packages pip
ipywidgets              8.0.2                                                          /home/user/.local/lib/python3.10/site-packages pip
iso8601                 1.0.2                                                          /home/user/.local/lib/python3.10/site-packages pip
isodate                 0.6.1                                                          /home/user/.local/lib/python3.10/site-packages pip
itsdangerous            2.1.2                                                          /home/user/.local/lib/python3.10/site-packages pip
jedi                    0.17.2                                                         /home/user/.local/lib/python3.10/site-packages pip
jeepney                 0.7.1                                                          /usr/lib/python3/dist-packages                 flit
Jinja2                  3.1.2                                                          /home/user/.local/lib/python3.10/site-packages pip
jmespath                1.0.1                                                          /home/user/.local/lib/python3.10/site-packages pip
jnius                   1.1.0                                                          /home/user/.local/lib/python3.10/site-packages pip
joblib                  1.1.0                                                          /home/user/.local/lib/python3.10/site-packages pip
Js2Py                   0.71                                                           /home/user/.local/lib/python3.10/site-packages pip
json5                   0.9.8                                                          /home/user/.local/lib/python3.10/site-packages pip
jsonpatch               1.32                                                           /usr/lib/python3/dist-packages
jsonpointer             2.0                                                            /usr/lib/python3/dist-packages
jsonschema              3.2.0                                                          /usr/lib/python3/dist-packages
jupyter-client          7.3.1                                                          /home/user/.local/lib/python3.10/site-packages pip
jupyter-core            4.10.0                                                         /home/user/.local/lib/python3.10/site-packages pip
jupyter-server          1.17.0                                                         /home/user/.local/lib/python3.10/site-packages pip
jupyterlab              3.4.1                                                          /home/user/.local/lib/python3.10/site-packages pip
jupyterlab-pygments     0.2.2                                                          /home/user/.local/lib/python3.10/site-packages pip
jupyterlab-server       2.13.0                                                         /home/user/.local/lib/python3.10/site-packages pip
jupyterlab-widgets      3.0.3                                                          /home/user/.local/lib/python3.10/site-packages pip
keyboard                0.13.5                                                         /home/user/.local/lib/python3.10/site-packages pip
keyring                 23.5.0                                                         /usr/lib/python3/dist-packages
kiwisolver              1.4.4                                                          /home/user/.local/lib/python3.10/site-packages pip
language-selector       0.1                                                            /usr/lib/python3/dist-packages
launchpadlib            1.10.16                                                        /usr/lib/python3/dist-packages
lazr.restfulclient      0.14.4                                                         /usr/lib/python3/dist-packages
lazr.uri                1.0.6                                                          /usr/lib/python3/dist-packages
libvirt-python          8.0.0                                                          /usr/lib/python3/dist-packages
linkify-it-py           1.0.3                                                          /home/user/.local/lib/python3.10/site-packages pip
livereload              2.6.3                                                          /usr/lib/python3/dist-packages
llvmlite                0.39.1                                                         /home/user/.local/lib/python3.10/site-packages pip
lxml                    4.9.1                                                          /home/user/.local/lib/python3.10/site-packages pip
Mako                    1.2.2                                                          /home/user/.local/lib/python3.10/site-packages pip
Markdown                3.4.1                                                          /home/user/.local/lib/python3.10/site-packages pip
markdown-it-py          2.1.0                                                          /home/user/.local/lib/python3.10/site-packages pip
MarkupSafe              2.1.1                                                          /home/user/.local/lib/python3.10/site-packages pip
matplotlib              3.5.3                                                          /home/user/.local/lib/python3.10/site-packages pip
matplotlib-inline       0.1.3                                                          /home/user/.local/lib/python3.10/site-packages pip
mdit-py-plugins         0.3.1                                                          /home/user/.local/lib/python3.10/site-packages pip
mdurl                   0.1.2                                                          /home/user/.local/lib/python3.10/site-packages pip
megatron-lm             1.1.5           /home/user/dev/Megatron-LM                     /home/user/dev/Megatron-LM
meld                    3.20.4                                                         /usr/lib/python3/dist-packages
metaseq                 0.5.6           /home/user/.local/lib/python3.10/site-packages /home/user/.local/lib/python3.10/site-packages pip
mistune                 0.8.4                                                          /home/user/.local/lib/python3.10/site-packages pip
mkdocs                  1.1.2                                                          /usr/lib/python3/dist-packages
monotonic               1.6                                                            /home/user/.local/lib/python3.10/site-packages pip
more-itertools          8.10.0                                                         /usr/lib/python3/dist-packages
msrest                  0.7.1                                                          /home/user/.local/lib/python3.10/site-packages pip
multidict               6.0.2                                                          /home/user/.local/lib/python3.10/site-packages pip
multiprocess            0.70.13                                                        /home/user/.local/lib/python3.10/site-packages pip
mypy                    0.971                                                          /home/user/.local/lib/python3.10/site-packages pip
mypy-extensions         0.4.3                                                          /home/user/.local/lib/python3.10/site-packages pip
nbclassic               0.3.7                                                          /home/user/.local/lib/python3.10/site-packages pip
nbclient                0.6.3                                                          /home/user/.local/lib/python3.10/site-packages pip
nbconvert               6.5.0                                                          /home/user/.local/lib/python3.10/site-packages pip
nbformat                5.4.0                                                          /home/user/.local/lib/python3.10/site-packages pip
nest-asyncio            1.5.5                                                          /home/user/.local/lib/python3.10/site-packages pip
netifaces               0.11.0                                                         /usr/lib/python3/dist-packages
ninja                   1.10.2.3                                                       /home/user/.local/lib/python3.10/site-packages pip
nltk                    3.7                                                            /home/user/.local/lib/python3.10/site-packages pip
nodeenv                 1.7.0                                                          /home/user/.local/lib/python3.10/site-packages pip
notebook                6.4.11                                                         /home/user/.local/lib/python3.10/site-packages pip
notebook-shim           0.1.0                                                          /home/user/.local/lib/python3.10/site-packages pip
numba                   0.56.2                                                         /home/user/.local/lib/python3.10/site-packages pip
numpy                   1.23.3                                                         /home/user/.local/lib/python3.10/site-packages pip
oauthlib                3.2.0                                                          /usr/lib/python3/dist-packages
olefile                 0.46                                                           /usr/lib/python3/dist-packages
omegaconf               2.1.1                                                          /home/user/.local/lib/python3.10/site-packages pip
openai                  0.20.0                                                         /home/user/.local/lib/python3.10/site-packages pip
opencv-python           4.6.0.66                                                       /home/user/.local/lib/python3.10/site-packages pip
openpyxl                3.0.10                                                         /home/user/.local/lib/python3.10/site-packages pip
orjson                  3.8.0                                                          /home/user/.local/lib/python3.10/site-packages pip
packaging               21.3                                                           /home/user/.local/lib/python3.10/site-packages pip
pandas                  1.4.2                                                          /home/user/.local/lib/python3.10/site-packages pip
pandas-stubs            1.2.0.61                                                       /home/user/.local/lib/python3.10/site-packages pip
pandocfilters           1.5.0                                                          /home/user/.local/lib/python3.10/site-packages pip
paramiko                2.11.0                                                         /home/user/.local/lib/python3.10/site-packages pip
parso                   0.7.1                                                          /home/user/.local/lib/python3.10/site-packages pip
pathspec                0.9.0                                                          /home/user/.local/lib/python3.10/site-packages pip
pendulum                2.1.2                                                          /home/user/.local/lib/python3.10/site-packages pip
pexpect                 4.8.0                                                          /usr/lib/python3/dist-packages
pickleshare             0.7.5                                                          /home/user/.local/lib/python3.10/site-packages pip
Pillow                  9.2.0                                                          /home/user/.local/lib/python3.10/site-packages pip
pip                     22.3                                                           /home/user/.local/lib/python3.10/site-packages pip
pipx                    1.0.0                                                          /usr/lib/python3/dist-packages
platformdirs            2.5.2                                                          /home/user/.local/lib/python3.10/site-packages pip
pluggy                  1.0.0                                                          /home/user/.local/lib/python3.10/site-packages pip
portalocker             2.5.1                                                          /home/user/.local/lib/python3.10/site-packages pip
pre-commit              2.20.0                                                         /home/user/.local/lib/python3.10/site-packages pip
prometheus-client       0.14.1                                                         /home/user/.local/lib/python3.10/site-packages pip
prompt-toolkit          3.0.29                                                         /home/user/.local/lib/python3.10/site-packages pip
protobuf                3.20.1                                                         /home/user/.local/lib/python3.10/site-packages pip
psutil                  5.9.0                                                          /home/user/.local/lib/python3.10/site-packages pip
ptyprocess              0.7.0                                                          /usr/lib/python3/dist-packages                 flit
pure-eval               0.2.2                                                          /home/user/.local/lib/python3.10/site-packages pip
py                      1.11.0                                                         /home/user/.local/lib/python3.10/site-packages pip
py-cpuinfo              8.0.0                                                          /home/user/.local/lib/python3.10/site-packages pip
py3nvml                 0.2.7                                                          /home/user/.local/lib/python3.10/site-packages pip
py4j                    0.10.9.5                                                       /home/user/.local/lib/python3.10/site-packages pip
pyarrow                 9.0.0                                                          /home/user/.local/lib/python3.10/site-packages pip
pyasn1                  0.4.8                                                          /usr/lib/python3/dist-packages
pyasn1-modules          0.2.1                                                          /usr/lib/python3/dist-packages
pybedtools              0.9.0                                                          /home/user/.local/lib/python3.10/site-packages pip
pybind11                2.10.0                                                         /home/user/.local/lib/python3.10/site-packages pip
pycairo                 1.20.1                                                         /usr/lib/python3/dist-packages
pycparser               2.21                                                           /home/user/.local/lib/python3.10/site-packages pip
pycryptodome            3.15.0                                                         /home/user/.local/lib/python3.10/site-packages pip
pycups                  2.0.1                                                          /usr/lib/python3/dist-packages
pydantic                1.10.2                                                         /home/user/.local/lib/python3.10/site-packages pip
pydle                   1.0.0                                                          /home/user/.local/lib/python3.10/site-packages pip
pydub                   0.25.1                                                         /home/user/.local/lib/python3.10/site-packages pip
pyfaidx                 0.7.1                                                          /home/user/.local/lib/python3.10/site-packages pip
Pygments                2.12.0                                                         /home/user/.local/lib/python3.10/site-packages pip
PyGObject               3.42.1                                                         /usr/lib/python3/dist-packages
PyHamcrest              2.0.2                                                          /usr/lib/python3/dist-packages
pyinotify               0.9.6                                                          /usr/lib/python3/dist-packages
pyjnius                 1.4.1                                                          /home/user/.local/lib/python3.10/site-packages pip
pyjsparser              2.7.1                                                          /home/user/.local/lib/python3.10/site-packages pip
PyJWT                   2.3.0                                                          /usr/lib/python3/dist-packages
pylibacl                0.6.0                                                          /usr/lib/python3/dist-packages
pymacaroons             0.13.0                                                         /usr/lib/python3/dist-packages
PyNaCl                  1.5.0                                                          /usr/lib/python3/dist-packages
pynput                  1.7.6                                                          /home/user/.local/lib/python3.10/site-packages pip
pyOpenSSL               21.0.0                                                         /usr/lib/python3/dist-packages
pyparsing               2.4.7                                                          /usr/lib/python3/dist-packages
PyQt5                   5.15.6                                                         /usr/lib/python3/dist-packages                 sip-build
PyQt5-sip               12.9.1                                                         /usr/lib/python3/dist-packages
pyrsistent              0.18.1                                                         /usr/lib/python3/dist-packages
pysam                   0.19.1                                                         /home/user/.local/lib/python3.10/site-packages pip
pyserial                3.5                                                            /usr/lib/python3/dist-packages
pytest                  7.1.2                                                          /home/user/.local/lib/python3.10/site-packages pip
python-apt              2.3.0+ubuntu2.1                                                /usr/lib/python3/dist-packages
python-dateutil         2.8.2                                                          /home/user/.local/lib/python3.10/site-packages pip
python-debian           0.1.43ubuntu1                                                  /usr/lib/python3/dist-packages
python-jsonrpc-server   0.4.0                                                          /home/user/.local/lib/python3.10/site-packages pip
python-language-server  0.36.2                                                         /home/user/.local/lib/python3.10/site-packages pip
python-multipart        0.0.5                                                          /home/user/.local/lib/python3.10/site-packages pip
python-twitch-irc       1.1.0                                                          /home/user/.local/lib/python3.10/site-packages pip
python-xlib             0.31                                                           /home/user/.local/lib/python3.10/site-packages pip
PythonTurtle            0.3.2                                                          /home/user/.local/lib/python3.10/site-packages pip
pyttsx3                 2.90                                                           /home/user/.local/lib/python3.10/site-packages pip
pytz                    2022.1                                                         /usr/lib/python3/dist-packages
pytz-deprecation-shim   0.1.0.post0                                                    /home/user/.local/lib/python3.10/site-packages pip
pytzdata                2020.1                                                         /home/user/.local/lib/python3.10/site-packages pip
pyusb                   1.2.1                                                          /home/user/.local/lib/python3.10/site-packages pip
pyxattr                 0.7.2                                                          /usr/lib/python3/dist-packages
PyYAML                  5.4.1                                                          /usr/lib/python3/dist-packages
pyzmq                   22.3.0                                                         /home/user/.local/lib/python3.10/site-packages pip
regex                   2022.7.25                                                      /home/user/.local/lib/python3.10/site-packages pip
reportlab               3.6.8                                                          /usr/lib/python3/dist-packages
requests                2.28.1                                                         /home/user/.local/lib/python3.10/site-packages pip
requests-oauthlib       1.3.1                                                          /home/user/.local/lib/python3.10/site-packages pip
requests-unixsocket     0.2.0                                                          /usr/lib/python3/dist-packages
RestrictedPython        5.2                                                            /home/user/.local/lib/python3.10/site-packages pip
rfc3986                 1.5.0                                                          /home/user/.local/lib/python3.10/site-packages pip
rpa                     1.47.0                                                         /home/user/.local/lib/python3.10/site-packages pip
rsa                     4.9                                                            /home/user/.local/lib/python3.10/site-packages pip
Rx                      3.2.0                                                          /home/user/.local/lib/python3.10/site-packages pip
s3transfer              0.6.0                                                          /home/user/.local/lib/python3.10/site-packages pip
sacrebleu               2.2.0                                                          /home/user/.local/lib/python3.10/site-packages pip
scikit-learn            1.1.2                                                          /home/user/.local/lib/python3.10/site-packages pip
scipy                   1.9.0                                                          /home/user/.local/lib/python3.10/site-packages pip
screen-resolution-extra 0.0.0                                                          /usr/lib/python3/dist-packages
SecretStorage           3.3.1                                                          /usr/lib/python3/dist-packages
Send2Trash              1.8.0                                                          /home/user/.local/lib/python3.10/site-packages pip
service-identity        18.1.0                                                         /usr/lib/python3/dist-packages
setuptools              59.6.0                                                         /usr/lib/python3/dist-packages
sikuli                  0.1                                                            /home/user/.local/lib/python3.10/site-packages
simplejson              3.17.6                                                         /usr/lib/python3/dist-packages
six                     1.16.0                                                         /usr/lib/python3/dist-packages
sklearn                 0.0                                                            /home/user/.local/lib/python3.10/site-packages pip
sniffio                 1.2.0                                                          /home/user/.local/lib/python3.10/site-packages pip
solo1                   0.1.1                                                          /home/user/.local/lib/python3.10/site-packages pip
sos                     4.3                                                            /usr/lib/python3/dist-packages
sounddevice             0.4.5                                                          /home/user/.local/lib/python3.10/site-packages pip
soupsieve               2.3.2.post1                                                    /home/user/.local/lib/python3.10/site-packages pip
SQLAlchemy              1.4.41                                                         /home/user/.local/lib/python3.10/site-packages pip
ssh-import-id           5.11                                                           /usr/lib/python3/dist-packages
stack-data              0.2.0                                                          /home/user/.local/lib/python3.10/site-packages pip
starlette               0.20.4                                                         /home/user/.local/lib/python3.10/site-packages pip
systemd-python          234                                                            /usr/lib/python3/dist-packages
tabulate                0.8.10                                                         /home/user/.local/lib/python3.10/site-packages pip
tagui                   1.47.0                                                         /home/user/.local/lib/python3.10/site-packages pip
tensorboard             2.8.0                                                          /home/user/.local/lib/python3.10/site-packages pip
tensorboard-data-server 0.6.1                                                          /home/user/.local/lib/python3.10/site-packages pip
tensorboard-plugin-wit  1.8.1                                                          /home/user/.local/lib/python3.10/site-packages pip
termcolor               1.1.0                                                          /home/user/.local/lib/python3.10/site-packages pip
terminado               0.13.3                                                         /home/user/.local/lib/python3.10/site-packages pip
threadpoolctl           3.1.0                                                          /home/user/.local/lib/python3.10/site-packages pip
timeout-decorator       0.5.0                                                          /home/user/.local/lib/python3.10/site-packages pip
tinycss2                1.1.1                                                          /home/user/.local/lib/python3.10/site-packages pip
tk                      0.1.0                                                          /home/user/.local/lib/python3.10/site-packages pip
tokenizers              0.12.1                                                         /home/user/.local/lib/python3.10/site-packages pip
toml                    0.10.2                                                         /home/user/.local/lib/python3.10/site-packages pip
tomli                   2.0.1                                                          /home/user/.local/lib/python3.10/site-packages pip
torch                   1.12.1+cu116                                                   /home/user/.local/lib/python3.10/site-packages pip
torchaudio              0.12.1+cu116                                                   /home/user/.local/lib/python3.10/site-packages pip
torchvision             0.13.1+cu116                                                   /home/user/.local/lib/python3.10/site-packages pip
tornado                 6.1                                                            /usr/lib/python3/dist-packages
tqdm                    4.64.0                                                         /home/user/.local/lib/python3.10/site-packages pip
traitlets               5.2.0                                                          /home/user/.local/lib/python3.10/site-packages pip
transformers            4.23.0.dev0                                                    /home/user/.local/lib/python3.10/site-packages pip
Twisted                 22.1.0                                                         /usr/lib/python3/dist-packages
twitch-python           0.0.20                                                         /home/user/.local/lib/python3.10/site-packages pip
twitchio                2.3.0                                                          /home/user/.local/lib/python3.10/site-packages pip
typing_extensions       4.3.0                                                          /home/user/.local/lib/python3.10/site-packages pip
tzdata                  2022.1                                                         /home/user/.local/lib/python3.10/site-packages pip
tzlocal                 4.2                                                            /home/user/.local/lib/python3.10/site-packages pip
ubuntu-advantage-tools  27.11.2                                                        /usr/lib/python3/dist-packages
ubuntu-drivers-common   0.0.0                                                          /usr/lib/python3/dist-packages
uc-micro-py             1.0.1                                                          /home/user/.local/lib/python3.10/site-packages pip
ufw                     0.36.1                                                         /usr/lib/python3/dist-packages
ujson                   5.5.0                                                          /home/user/.local/lib/python3.10/site-packages pip
unattended-upgrades     0.1                                                            /usr/lib/python3/dist-packages
urllib3                 1.26.12                                                        /home/user/.local/lib/python3.10/site-packages pip
usb-creator             0.3.7                                                          /usr/lib/python3/dist-packages
userpath                1.8.0                                                          /usr/lib/python3/dist-packages
uvicorn                 0.18.3                                                         /home/user/.local/lib/python3.10/site-packages pip
virtualenv              20.16.3                                                        /home/user/.local/lib/python3.10/site-packages pip
vosk                    0.3.37                                                         /home/user/.local/lib/python3.10/site-packages pip
wadllib                 1.3.6                                                          /usr/lib/python3/dist-packages
wcwidth                 0.2.5                                                          /home/user/.local/lib/python3.10/site-packages pip
webencodings            0.5.1                                                          /home/user/.local/lib/python3.10/site-packages pip
websocket-client        1.3.2                                                          /home/user/.local/lib/python3.10/site-packages pip
websockets              10.3                                                           /home/user/.local/lib/python3.10/site-packages pip
Werkzeug                2.2.2                                                          /home/user/.local/lib/python3.10/site-packages pip
wheel                   0.37.1                                                         /usr/lib/python3/dist-packages
widgetsnbextension      4.0.3                                                          /home/user/.local/lib/python3.10/site-packages pip
xkit                    0.0.0                                                          /usr/lib/python3/dist-packages
xmltodict               0.13.0                                                         /home/user/.local/lib/python3.10/site-packages pip
xxhash                  3.0.0                                                          /home/user/.local/lib/python3.10/site-packages pip
yarl                    1.7.2                                                          /home/user/.local/lib/python3.10/site-packages pip
zipp                    1.0.0                                                          /usr/lib/python3/dist-packages
zope.interface          5.4.0                                                          /usr/lib/python3/dist-packages

auwsom avatar Oct 27 '22 00:10 auwsom

@mrwyattii also, sorry, I didnt realize that collapsing of the error above broke the formatting. It doesnt do that on GH Gist. So here it is again if it helps, but it is just the same error. I'll try to clone this VM and run another torch install on the HDD after expanding.

https://gist.github.com/auwsom/2faf04fc8280685a3342e87a32402113

auwsom avatar Oct 27 '22 01:10 auwsom

@mrwyattii also, sorry, I didnt realize that collapsing of the error above broke the formatting. It doesnt do that on GH Gist. So here it is again if it helps, but it is just the same error. I'll try to clone this VM and run another torch install on the HDD after expanding.

https://gist.github.com/auwsom/2faf04fc8280685a3342e87a32402113

This error message is much easier to parse and I found the real error here:

/home/user/.local/lib/python3.10/site-packages/torch/include/c10/cuda/CUDAStream.h:6:10: fatal error: cuda_runtime_api.h: No such file or directory
    6 | #include <cuda_runtime_api.h>

This indicates it's a problem with your environment, but it's hard to say exactly what needs to change in your environment to fix this. Can you give me some more information about how you are setting up your VM?

mrwyattii avatar Oct 28 '22 22:10 mrwyattii

@mrwyattii yes, that is the first error I included. That file is definitely not there.

I have a VM image of Ubuntu 22.04 with the Kubuntu added on top of the cloud-init server image. Wasn't installed from ISO, but should be pretty vanilla to the same images used in GCP, AWS, etc.

The Host is the same LTS Ubuntu image before it was upgraded at 20.04. GPU is passed through with virtio and Virt-manager pretty standardly. Added about 50GB ram and 12 CPU cores to it. Storage is about 40GBs.

I have added other python packages, and installed the Nvidia driver with Apt and compatible Cuda by a direct download. I'll edit in a link about the process, from askubuntu.com.

Then created a venv after cloning the MII repo and ran the pip install .

Somewhere this .h api file wasn't built. That's why my initial theory was the C complier wasn't getting info about the correct Cuda version. But that is a guess. Is there a verbose mode for the compile process or a log file maybe?

auwsom avatar Oct 28 '22 23:10 auwsom

You're right, this is the first error you pointed out. Sorry this thread is getting quite long and I missed that.

The way you're setting up your VM and venv seems fine. The error is really coming from torch and not DeepSpeed / MII. I see that you have found the header file that the compiler cannot find here /usr/local/cuda-11.7/targets/x86_64-linux/include/cuda_runtime_api.h. You may try including this in CUDA_HOME or LD_LIBRARY_PATH in your environment. See the suggestion here for a similar error: https://github.com/HawkAaron/warp-transducer/issues/15#issuecomment-467668750

mrwyattii avatar Oct 31 '22 21:10 mrwyattii