DeepSpeed
DeepSpeed copied to clipboard
CUDA error: no kernel image is available for execution on the device
I am getting the following error while attempting to run deepspeed-chat step 1. Specifically, run_6.7b.sh.
Traceback (most recent call last): File "main.py", line 341, in
main() File "main.py", line 286, in main model, optimizer, _, lr_scheduler = deepspeed.initialize( File "/data/nt12_ssd_gluster/myself/miniconda3/lib/python3.8/site-packages/deepspeed/init.py", line 156, in initialize engine = DeepSpeedEngine(args=args, File "/data/nt12_ssd_gluster/myself/miniconda3/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 328, in init self._configure_optimizer(optimizer, model_parameters) File "/data/nt12_ssd_gluster/myself/miniconda3/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 1187, in _configure_optimizer self.optimizer = self._configure_zero_optimizer(basic_optimizer) File "/data/nt12_ssd_gluster/myself/miniconda3/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 1418, in _configure_zero_optimizer optimizer = DeepSpeedZeroOptimizer( File "/data/nt12_ssd_gluster/myself/miniconda3/lib/python3.8/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 485, in init self.initialize_optimizer_states() File "/data/nt12_ssd_gluster/myself/miniconda3/lib/python3.8/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 614, in initialize_optimizer_states self.optimizer.step() File "/data/nt12_ssd_gluster/myself/miniconda3/lib/python3.8/site-packages/torch/optim/lr_scheduler.py", line 65, in wrapper return wrapped(*args, **kwargs) File "/data/nt12_ssd_gluster/myself/miniconda3/lib/python3.8/site-packages/torch/optim/optimizer.py", line 88, in wrapper return func(*args, **kwargs) File "/data/nt12_ssd_gluster/myself/miniconda3/lib/python3.8/site-packages/deepspeed/ops/adam/fused_adam.py", line 151, in step multi_tensor_applier(self.multi_tensor_adam, self._dummy_overflow_buf, [g_32, p_32, m_32, v_32], File "/data/nt12_ssd_gluster/myself/miniconda3/lib/python3.8/site-packages/deepspeed/ops/adam/multi_tensor_apply.py", line 17, in call return op(self.chunk_size, noop_flag_buffer, tensor_lists, *args) RuntimeError: CUDA error: no kernel image is available for execution on the device CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
The problem persists when I changed the FusedAdam
optimizer into apex.optimizers.FusedAdam
.
I am using Ubuntu 18.04 with 8 Nvidia A100 GPUs. The cuda version of nvcc -v
is 11.3 and the version of nvidia-smi
is 11.4. The Pytorch version is 1.11.1+cu113, so it seems to be compatible with the cuda version. I installed deepspeed from source code using DS_BUILD_FUSED_ADAM=1 pip install . --global-option="build_ext" --global-option="-j8"
to only install the FusedAdam. I have other errors when specifying DS_BUILD_OPS=1
.