DeepSpeed when I execute the command "python train.py --actor-model facebook/opt-1.3b --reward-model facebook/opt-350m --deployment-type single_gpu" I got the follow result, and I try many solutions but all failure.

[2023-04-19 10:49:12,862] [WARNING] [runner.py:190:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only. [2023-04-19 10:49:12,872] [INFO] [runner.py:540:main] cmd = /home/ubuntu/anaconda3/envs/DeepSpeed/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMF19 --master_addr=127.0.0.1 --master_port=29500 --enable_each_rank_log=None main.py --model_name_or_path facebook/opt-1.3b --gradient_accumulation_steps 2 --lora_dim 128 --zero_stage 0 --deepspeed --output_dir /home/ubuntu/Project/DeepSpeedExamples/applications/DeepSpeed-Chat/output/actor-models/1.3b [2023-04-19 10:49:15,093] [INFO] [launch.py:229:main] WORLD INFO DICT: {'localhost': [0]} [2023-04-19 10:49:15,093] [INFO] [launch.py:235:main] nnodes=1, num_local_procs=1, node_rank=0 [2023-04-19 10:49:15,093] [INFO] [launch.py:246:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0]}) [2023-04-19 10:49:15,093] [INFO] [launch.py:247:main] dist_world_size=1 [2023-04-19 10:49:15,093] [INFO] [launch.py:249:main] Setting CUDA_VISIBLE_DEVICES=0 [2023-04-19 10:49:18,024] [INFO] [comm.py:586:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl Found cached dataset parquet (/home/ubuntu/.cache/huggingface/datasets/Dahoas___parquet/default-b9d2c4937d617106/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec)

0%| | 0/2 [00:00<?, ?it/s] 100%|██████████| 2/2 [00:00<00:00, 580.08it/s] huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using tokenizers before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using tokenizers before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using tokenizers before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using tokenizers before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) Using /home/ubuntu/.cache/torch_extensions/py38_cu117 as PyTorch extensions root... huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using tokenizers before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using tokenizers before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using tokenizers before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) Detected CUDA files, patching ldflags Emitting ninja build file /home/ubuntu/.cache/torch_extensions/py38_cu117/fused_adam/build.ninja... Building extension module fused_adam... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using tokenizers before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) [1/2] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=fused_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -I/home/ubuntu/anaconda3/envs/DeepSpeed/lib/python3.8/site-packages/deepspeed/ops/csrc/includes -I/home/ubuntu/anaconda3/envs/DeepSpeed/lib/python3.8/site-packages/deepspeed/ops/csrc/adam -isystem /home/ubuntu/anaconda3/envs/DeepSpeed/lib/python3.8/site-packages/torch/include -isystem /home/ubuntu/anaconda3/envs/DeepSpeed/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /home/ubuntu/anaconda3/envs/DeepSpeed/lib/python3.8/site-packages/torch/include/TH -isystem /home/ubuntu/anaconda3/envs/DeepSpeed/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /home/ubuntu/anaconda3/envs/DeepSpeed/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -lineinfo --use_fast_math -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_89,code=compute_89 -std=c++17 -c /home/ubuntu/anaconda3/envs/DeepSpeed/lib/python3.8/site-packages/deepspeed/ops/csrc/adam/multi_tensor_adam.cu -o multi_tensor_adam.cuda.o FAILED: multi_tensor_adam.cuda.o /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=fused_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -I/home/ubuntu/anaconda3/envs/DeepSpeed/lib/python3.8/site-packages/deepspeed/ops/csrc/includes -I/home/ubuntu/anaconda3/envs/DeepSpeed/lib/python3.8/site-packages/deepspeed/ops/csrc/adam -isystem /home/ubuntu/anaconda3/envs/DeepSpeed/lib/python3.8/site-packages/torch/include -isystem /home/ubuntu/anaconda3/envs/DeepSpeed/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /home/ubuntu/anaconda3/envs/DeepSpeed/lib/python3.8/site-packages/torch/include/TH -isystem /home/ubuntu/anaconda3/envs/DeepSpeed/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /home/ubuntu/anaconda3/envs/DeepSpeed/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -lineinfo --use_fast_math -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_89,code=compute_89 -std=c++17 -c /home/ubuntu/anaconda3/envs/DeepSpeed/lib/python3.8/site-packages/deepspeed/ops/csrc/adam/multi_tensor_adam.cu -o multi_tensor_adam.cuda.o nvcc fatal : Unsupported gpu architecture 'compute_89' ninja: build stopped: subcommand failed. Traceback (most recent call last): File "/home/ubuntu/anaconda3/envs/DeepSpeed/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1893, in _run_ninja_build subprocess.run( File "/home/ubuntu/anaconda3/envs/DeepSpeed/lib/python3.8/subprocess.py", line 512, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "main.py", line 339, in main() File "main.py", line 271, in main optimizer = AdamOptimizer(optimizer_grouped_parameters, File "/home/ubuntu/anaconda3/envs/DeepSpeed/lib/python3.8/site-packages/deepspeed/ops/adam/fused_adam.py", line 71, in init fused_adam_cuda = FusedAdamBuilder().load() File "/home/ubuntu/anaconda3/envs/DeepSpeed/lib/python3.8/site-packages/deepspeed/ops/op_builder/builder.py", line 449, in load return self.jit_load(verbose) File "/home/ubuntu/anaconda3/envs/DeepSpeed/lib/python3.8/site-packages/deepspeed/ops/op_builder/builder.py", line 480, in jit_load op_module = load(name=self.name, File "/home/ubuntu/anaconda3/envs/DeepSpeed/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1284, in load return _jit_compile( File "/home/ubuntu/anaconda3/envs/DeepSpeed/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1509, in _jit_compile _write_ninja_file_and_build_library( File "/home/ubuntu/anaconda3/envs/DeepSpeed/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1624, in _write_ninja_file_and_build_library _run_ninja_build( File "/home/ubuntu/anaconda3/envs/DeepSpeed/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1909, in _run_ninja_build raise RuntimeError(message) from e RuntimeError: Error building extension 'fused_adam' [2023-04-19 10:50:23,176] [INFO] [launch.py:428:sigkill_handler] Killing subprocess 41342 [2023-04-19 10:50:23,178] [ERROR] [launch.py:434:sigkill_handler] ['/home/ubuntu/anaconda3/envs/DeepSpeed/bin/python', '-u', 'main.py', '--local_rank=0', '--model_name_or_path', 'facebook/opt-1.3b', '--gradient_accumulation_steps', '2', '--lora_dim', '128', '--zero_stage', '0', '--deepspeed', '--output_dir', '/home/ubuntu/Project/DeepSpeedExamples/applications/DeepSpeed-Chat/output/actor-models/1.3b'] exits with return code = 1

Apr 19 '23 03:04 LanShanPi

I execute the follow command to configuration anaconda environment: pip install deepspeed>=0.9.0 git clone https://github.com/microsoft/DeepSpeedExamples.git cd DeepSpeedExamples/applications/DeepSpeed-Chat/ pip install -r requirements.txt

Apr 19 '23 03:04 LanShanPi

Running with CUDA 11.7 on Ubuntu 20/04.

Apr 19 '23 03:04 LanShanPi

I can run this script successfully on CUDA 11.3 Pytorch 1.12.0

Apr 19 '23 03:04 Flywolfs

I execute the follow command to configuration anaconda environment: pip install deepspeed>=0.9.0 git clone https://github.com/microsoft/DeepSpeedExamples.git cd DeepSpeedExamples/applications/DeepSpeed-Chat/ pip install -r requirements.txt

Have you tried install from source? I guess the deepspeed pip package does not sync with the latest dev version. You might want to try install from source and see if it works.

Apr 19 '23 04:04 alibabadoufu

meet the same issue

Apr 19 '23 08:04 boyuzz

@Flywolfs I am trying to match your version.

Apr 19 '23 10:04 LanShanPi

Running with CUDA 11.7 on Ubuntu 20/04.

[2023-04-19 10:49:12,862] [WARNING] [runner.py:190:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only. [2023-04-19 10:49:12,872] [INFO] [runner.py:540:main] cmd = /home/ubuntu/anaconda3/envs/DeepSpeed/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMF19 --master_addr=127.0.0.1 --master_port=29500 --enable_each_rank_log=None main.py --model_name_or_path facebook/opt-1.3b --gradient_accumulation_steps 2 --lora_dim 128 --zero_stage 0 --deepspeed --output_dir /home/ubuntu/Project/DeepSpeedExamples/applications/DeepSpeed-Chat/output/actor-models/1.3b [2023-04-19 10:49:15,093] [INFO] [launch.py:229:main] WORLD INFO DICT: {'localhost': [0]} [2023-04-19 10:49:15,093] [INFO] [launch.py:235:main] nnodes=1, num_local_procs=1, node_rank=0 [2023-04-19 10:49:15,093] [INFO] [launch.py:246:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0]}) [2023-04-19 10:49:15,093] [INFO] [launch.py:247:main] dist_world_size=1 [2023-04-19 10:49:15,093] [INFO] [launch.py:249:main] Setting CUDA_VISIBLE_DEVICES=0 [2023-04-19 10:49:18,024] [INFO] [comm.py:586:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl Found cached dataset parquet (/home/ubuntu/.cache/huggingface/datasets/Dahoas___parquet/default-b9d2c4937d617106/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec)

0%| | 0/2 [00:00<?, ?it/s] 100%|██████████| 2/2 [00:00<00:00, 580.08it/s] huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using tokenizers before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using tokenizers before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using tokenizers before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using tokenizers before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) Using /home/ubuntu/.cache/torch_extensions/py38_cu117 as PyTorch extensions root... huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using tokenizers before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using tokenizers before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using tokenizers before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) Detected CUDA files, patching ldflags Emitting ninja build file /home/ubuntu/.cache/torch_extensions/py38_cu117/fused_adam/build.ninja... Building extension module fused_adam... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using tokenizers before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) [1/2] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=fused_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -I/home/ubuntu/anaconda3/envs/DeepSpeed/lib/python3.8/site-packages/deepspeed/ops/csrc/includes -I/home/ubuntu/anaconda3/envs/DeepSpeed/lib/python3.8/site-packages/deepspeed/ops/csrc/adam -isystem /home/ubuntu/anaconda3/envs/DeepSpeed/lib/python3.8/site-packages/torch/include -isystem /home/ubuntu/anaconda3/envs/DeepSpeed/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /home/ubuntu/anaconda3/envs/DeepSpeed/lib/python3.8/site-packages/torch/include/TH -isystem /home/ubuntu/anaconda3/envs/DeepSpeed/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /home/ubuntu/anaconda3/envs/DeepSpeed/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -lineinfo --use_fast_math -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_89,code=compute_89 -std=c++17 -c /home/ubuntu/anaconda3/envs/DeepSpeed/lib/python3.8/site-packages/deepspeed/ops/csrc/adam/multi_tensor_adam.cu -o multi_tensor_adam.cuda.o FAILED: multi_tensor_adam.cuda.o /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=fused_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -I/home/ubuntu/anaconda3/envs/DeepSpeed/lib/python3.8/site-packages/deepspeed/ops/csrc/includes -I/home/ubuntu/anaconda3/envs/DeepSpeed/lib/python3.8/site-packages/deepspeed/ops/csrc/adam -isystem /home/ubuntu/anaconda3/envs/DeepSpeed/lib/python3.8/site-packages/torch/include -isystem /home/ubuntu/anaconda3/envs/DeepSpeed/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /home/ubuntu/anaconda3/envs/DeepSpeed/lib/python3.8/site-packages/torch/include/TH -isystem /home/ubuntu/anaconda3/envs/DeepSpeed/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /home/ubuntu/anaconda3/envs/DeepSpeed/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -lineinfo --use_fast_math -gencode=arch=compute_89,code=sm_89 -gencode=arch=compute_89,code=compute_89 -std=c++17 -c /home/ubuntu/anaconda3/envs/DeepSpeed/lib/python3.8/site-packages/deepspeed/ops/csrc/adam/multi_tensor_adam.cu -o multi_tensor_adam.cuda.o nvcc fatal : Unsupported gpu architecture 'compute_89' ninja: build stopped: subcommand failed. Traceback (most recent call last): File "/home/ubuntu/anaconda3/envs/DeepSpeed/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1893, in _run_ninja_build subprocess.run( File "/home/ubuntu/anaconda3/envs/DeepSpeed/lib/python3.8/subprocess.py", line 512, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "main.py", line 339, in main() File "main.py", line 271, in main optimizer = AdamOptimizer(optimizer_grouped_parameters, File "/home/ubuntu/anaconda3/envs/DeepSpeed/lib/python3.8/site-packages/deepspeed/ops/adam/fused_adam.py", line 71, in init fused_adam_cuda = FusedAdamBuilder().load() File "/home/ubuntu/anaconda3/envs/DeepSpeed/lib/python3.8/site-packages/deepspeed/ops/op_builder/builder.py", line 449, in load return self.jit_load(verbose) File "/home/ubuntu/anaconda3/envs/DeepSpeed/lib/python3.8/site-packages/deepspeed/ops/op_builder/builder.py", line 480, in jit_load op_module = load(name=self.name, File "/home/ubuntu/anaconda3/envs/DeepSpeed/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1284, in load return _jit_compile( File "/home/ubuntu/anaconda3/envs/DeepSpeed/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1509, in _jit_compile _write_ninja_file_and_build_library( File "/home/ubuntu/anaconda3/envs/DeepSpeed/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1624, in _write_ninja_file_and_build_library _run_ninja_build( File "/home/ubuntu/anaconda3/envs/DeepSpeed/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1909, in _run_ninja_build raise RuntimeError(message) from e RuntimeError: Error building extension 'fused_adam' [2023-04-19 10:50:23,176] [INFO] [launch.py:428:sigkill_handler] Killing subprocess 41342 [2023-04-19 10:50:23,178] [ERROR] [launch.py:434:sigkill_handler] ['/home/ubuntu/anaconda3/envs/DeepSpeed/bin/python', '-u', 'main.py', '--local_rank=0', '--model_name_or_path', 'facebook/opt-1.3b', '--gradient_accumulation_steps', '2', '--lora_dim', '128', '--zero_stage', '0', '--deepspeed', '--output_dir', '/home/ubuntu/Project/DeepSpeedExamples/applications/DeepSpeed-Chat/output/actor-models/1.3b'] exits with return code = 1

i got the same problem. i solve this problem by make torch-version match cuda-version. I worked with cuda11.3 + torch1.12 From your log, you have cuda11.7, may be you can :

pip install torch==1.13.0+cu117 -f https://download.pytorch.org/whl/torch_stable.html

from versions: 0.4.1, 0.4.1.post2, 1.0.0, 1.0.1, 1.0.1.post2, 1.1.0, 1.2.0, 1.2.0+cpu, 1.2.0+cu92, 1.3.0, 1.3.0+cpu, 1.3.0+cu100, 1.3.0+cu92, 1.3.1, 1.3.1+cpu, 1.3.1+cu100, 1.3.1+cu92, 1.4.0, 1.4.0+cpu, 1.4.0+cu100, 1.4.0+cu92, 1.5.0, 1.5.0+cpu, 1.5.0+cu101, 1.5.0+cu92, 1.5.1, 1.5.1+cpu, 1.5.1+cu101, 1.5.1+cu92, 1.6.0, 1.6.0+cpu, 1.6.0+cu101, 1.6.0+cu92, 1.7.0, 1.7.0+cpu, 1.7.0+cu101, 1.7.0+cu110, 1.7.0+cu92, 1.7.1, 1.7.1+cpu, 1.7.1+cu101, 1.7.1+cu110, 1.7.1+cu92, 1.7.1+rocm3.7, 1.7.1+rocm3.8, 1.8.0, 1.8.0+cpu, 1.8.0+cu101, 1.8.0+cu111, 1.8.0+rocm3.10, 1.8.0+rocm4.0.1, 1.8.1, 1.8.1+cpu, 1.8.1+cu101, 1.8.1+cu102, 1.8.1+cu111, 1.8.1+rocm3.10, 1.8.1+rocm4.0.1, 1.9.0, 1.9.0+cpu, 1.9.0+cu102, 1.9.0+cu111, 1.9.0+rocm4.0.1, 1.9.0+rocm4.1, 1.9.0+rocm4.2, 1.9.1, 1.9.1+cpu, 1.9.1+cu102, 1.9.1+cu111, 1.9.1+rocm4.0.1, 1.9.1+rocm4.1, 1.9.1+rocm4.2, 1.10.0, 1.10.0+cpu, 1.10.0+cu102, 1.10.0+cu111, 1.10.0+cu113, 1.10.0+rocm4.0.1, 1.10.0+rocm4.1, 1.10.0+rocm4.2, 1.10.1, 1.10.1+cpu, 1.10.1+cu102, 1.10.1+cu111, 1.10.1+cu113, 1.10.1+rocm4.0.1, 1.10.1+rocm4.1, 1.10.1+rocm4.2, 1.10.2, 1.10.2+cpu, 1.10.2+cu102, 1.10.2+cu111, 1.10.2+cu113, 1.10.2+rocm4.0.1, 1.10.2+rocm4.1, 1.10.2+rocm4.2, 1.11.0, 1.11.0+cpu, 1.11.0+cu102, 1.11.0+cu113, 1.11.0+cu115, 1.11.0+rocm4.3.1, 1.11.0+rocm4.5.2, 1.12.0, 1.12.0+cpu, 1.12.0+cu102, 1.12.0+cu113, 1.12.0+cu116, 1.12.0+rocm5.0, 1.12.0+rocm5.1.1, 1.12.1, 1.12.1+cpu, 1.12.1+cu102, 1.12.1+cu113, 1.12.1+cu116, 1.12.1+rocm5.0, 1.12.1+rocm5.1.1, 1.13.0, 1.13.0+cpu, 1.13.0+cu116, 1.13.0+cu117, 1.13.0+cu117.with.pypi.cudnn, 1.13.0+rocm5.1.1, 1.13.0+rocm5.2, 1.13.1, 1.13.1+cpu, 1.13.1+cu116, 1.13.1+cu117, 1.13.1+cu117.with.pypi.cudnn, 1.13.1+rocm5.1.1, 1.13.1+rocm5.2)

more detail you can find in bolg: deepspeed chat