DeepSpeed
DeepSpeed copied to clipboard
[BUG]nvcc fatal when DS_BUILD_TRANSFORMER_INFERENCE=1
i user pytorch 1.12 with cuda 11.6, and with ds config of
DS_BUILD_FUSED_ADAM=1 DS_BUILD_CPU_ADAM=1 DS_BUILD_UTILS=1 DS_BUILD_TRANSFORMER=0 DS_BUILD_STOCHASTIC_TRANSFORMER=0 DS_BUILD_TRANSFORMER_INFERENCE=1 DS_BUILD_OPS=0
use pip3 install deepspeed --global-option="build_ext" --global-option="-j8"
and the finnal error is error: command '/usr/local/cuda/bin/nvcc' failed with exit code 1
nvcc fatal : Unknown option '-Wno-reorder'
What version of pip do you have installed? pip --version
0.9.1 error bug 0.9.0 works
@kuangdao you get this bug with deepspeed 0.9.1
but not with 0.9.0
?
I solved this by simply removed additional options, i.e. DS_BUILD_OPS=1 pip install deepspeed
You doesn't solve the issue as you never build extensions any more.
I got the same error.
torch 1.13.1+cu117
torchaudio 0.13.1+cu117
torchvision 0.14.1+cu117
deepspeed==0.9.2 cuda==11.7
nvcc fatal : Unknown option '-Wno-reorder' nvcc fatal : Unknown option '-Wall'
Same error with the latest Nvidia pytorch Docker image. It happens to me with both 0.9.0
and 0.9.4
(and presumably every version in between).
Nvidia driver version: 525.116.03.
Using nvidia-container-runtime
.
RTX 6000 Ada GPU.
Minimal repro:
FROM nvcr.io/nvidia/pytorch:23.05-py3
RUN pip install --upgrade pip
RUN apt-get update
RUN apt-get install -y libaio-dev
ENV CUDA_HOME='/usr/local/cuda'
RUN pip install py-cpuinfo
# NOTE: do this: https://stackoverflow.com/questions/59691207/docker-build-with-nvidia-runtime
RUN DS_BUILD_OPS=1 DS_BUILD_SPARSE_ATTN=0 pip install deepspeed==0.9.4 --global-option="build_ext" --global-option="-j32"
This seems to be a longstanding bug in pytorch, with many duplicate issues (https://github.com/pytorch/vision/issues/2001, https://github.com/pytorch/pytorch/issues/36378, https://github.com/pytorch/pytorch/issues/31283).
It seems to come from https://github.com/pytorch/pytorch/blob/15eed5b73ef1ebe0d1142d70bab7c20300a2aa2c/cmake/public/utils.cmake#L435
It's not clear to me why this bug doesn't affect everyone, which makes me think there is probably a workaround out there somewhere.
Setting NVCC_PREPEND_FLAGS="--forward-unknown-opts"
appears to be a workaround for this issue.