DeepSpeed icon indicating copy to clipboard operation
DeepSpeed copied to clipboard

CUDA error: no kernel image is available for execution on the device

Open publicstaticvo opened this issue 1 year ago • 0 comments

I am getting the following error while attempting to run deepspeed-chat step 1. Specifically, run_6.7b.sh.

Traceback (most recent call last): File "main.py", line 341, in main() File "main.py", line 286, in main model, optimizer, _, lr_scheduler = deepspeed.initialize( File "/data/nt12_ssd_gluster/myself/miniconda3/lib/python3.8/site-packages/deepspeed/init.py", line 156, in initialize engine = DeepSpeedEngine(args=args, File "/data/nt12_ssd_gluster/myself/miniconda3/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 328, in init self._configure_optimizer(optimizer, model_parameters) File "/data/nt12_ssd_gluster/myself/miniconda3/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 1187, in _configure_optimizer self.optimizer = self._configure_zero_optimizer(basic_optimizer) File "/data/nt12_ssd_gluster/myself/miniconda3/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 1418, in _configure_zero_optimizer optimizer = DeepSpeedZeroOptimizer( File "/data/nt12_ssd_gluster/myself/miniconda3/lib/python3.8/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 485, in init self.initialize_optimizer_states() File "/data/nt12_ssd_gluster/myself/miniconda3/lib/python3.8/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 614, in initialize_optimizer_states self.optimizer.step() File "/data/nt12_ssd_gluster/myself/miniconda3/lib/python3.8/site-packages/torch/optim/lr_scheduler.py", line 65, in wrapper return wrapped(*args, **kwargs) File "/data/nt12_ssd_gluster/myself/miniconda3/lib/python3.8/site-packages/torch/optim/optimizer.py", line 88, in wrapper return func(*args, **kwargs) File "/data/nt12_ssd_gluster/myself/miniconda3/lib/python3.8/site-packages/deepspeed/ops/adam/fused_adam.py", line 151, in step multi_tensor_applier(self.multi_tensor_adam, self._dummy_overflow_buf, [g_32, p_32, m_32, v_32], File "/data/nt12_ssd_gluster/myself/miniconda3/lib/python3.8/site-packages/deepspeed/ops/adam/multi_tensor_apply.py", line 17, in call return op(self.chunk_size, noop_flag_buffer, tensor_lists, *args) RuntimeError: CUDA error: no kernel image is available for execution on the device CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

The problem persists when I changed the FusedAdam optimizer into apex.optimizers.FusedAdam.

I am using Ubuntu 18.04 with 8 Nvidia A100 GPUs. The cuda version of nvcc -v is 11.3 and the version of nvidia-smi is 11.4. The Pytorch version is 1.11.1+cu113, so it seems to be compatible with the cuda version. I installed deepspeed from source code using DS_BUILD_FUSED_ADAM=1 pip install . --global-option="build_ext" --global-option="-j8" to only install the FusedAdam. I have other errors when specifying DS_BUILD_OPS=1.

publicstaticvo avatar Apr 15 '23 17:04 publicstaticvo