DeepSpeedExamples
DeepSpeedExamples copied to clipboard
bing_bert script error
Error occurred running bing_bert/ds_train_bert_nvidia_data_bsz64k_seq128.sh
Detected CUDA files, patching ldflags Emitting ninja build file /home/bduser/.cache/torch_extensions/py38_cu114/fused_lamb/build.ninja... Building extension module fused_lamb... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [1/2] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=fused_lamb -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1013" -I/home/bduser/anaconda3/envs/deepspeed/lib/python3.8/site-packages/deepspeed/ops/csrc/includes -isystem /home/bduser/anaconda3/envs/deepspeed/lib/python3.8/site-packages/torch/include -isystem /home/bduser/anaconda3/envs/deepspeed/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /home/bduser/anaconda3/envs/deepspeed/lib/python3.8/site-packages/torch/include/TH -isystem /home/bduser/anaconda3/envs/deepspeed/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /home/bduser/anaconda3/envs/deepspeed/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=1 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -lineinfo -O3 --use_fast_math -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_70,code=compute_70 -std=c++14 -c /home/bduser/anaconda3/envs/deepspeed/lib/python3.8/site-packages/deepspeed/ops/csrc/lamb/fused_lamb_cuda_kernel.cu -o fused_lamb_cuda_kernel.cuda.o FAILED: fused_lamb_cuda_kernel.cuda.o /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=fused_lamb -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1013" -I/home/bduser/anaconda3/envs/deepspeed/lib/python3.8/site-packages/deepspeed/ops/csrc/includes -isystem /home/bduser/anaconda3/envs/deepspeed/lib/python3.8/site-packages/torch/include -isystem /home/bduser/anaconda3/envs/deepspeed/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /home/bduser/anaconda3/envs/deepspeed/lib/python3.8/site-packages/torch/include/TH -isystem /home/bduser/anaconda3/envs/deepspeed/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /home/bduser/anaconda3/envs/deepspeed/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=1 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -lineinfo -O3 --use_fast_math -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_70,code=compute_70 -std=c++14 -c /home/bduser/anaconda3/envs/deepspeed/lib/python3.8/site-packages/deepspeed/ops/csrc/lamb/fused_lamb_cuda_kernel.cu -o fused_lamb_cuda_kernel.cuda.o /home/bduser/anaconda3/envs/deepspeed/lib/python3.8/site-packages/deepspeed/ops/csrc/lamb/fused_lamb_cuda_kernel.cu(467): error: identifier "THCudaCheck" is undefined
/home/bduser/anaconda3/envs/deepspeed/lib/python3.8/site-packages/deepspeed/ops/csrc/lamb/fused_lamb_cuda_kernel.cu(351): warning: variable "threads" was declared but never referenced
1 error detected in the compilation of "/home/bduser/anaconda3/envs/deepspeed/lib/python3.8/site-packages/deepspeed/ops/csrc/lamb/fused_lamb_cuda_kernel.cu". ninja: build stopped: subcommand failed. Traceback (most recent call last): File "/home/bduser/anaconda3/envs/deepspeed/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1718, in _run_ninja_build subprocess.run( File "/home/bduser/anaconda3/envs/deepspeed/lib/python3.8/subprocess.py", line 512, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
The above exception was the direct cause of the following exception:
Traceback (most recent call last): File "/home/bduser/src/deepspeedexamples/bing_bert/deepspeed_train.py", line 600, in
main() File "/home/bduser/src/deepspeedexamples/bing_bert/deepspeed_train.py", line 589, in main model, optimizer = prepare_model_optimizer(args) File "/home/bduser/src/deepspeedexamples/bing_bert/deepspeed_train.py", line 468, in prepare_model_optimizer model.network, optimizer, _, _ = deepspeed.initialize( File "/home/bduser/anaconda3/envs/deepspeed/lib/python3.8/site-packages/deepspeed/init.py", line 131, in initialize engine = DeepSpeedEngine(args=args, File "/home/bduser/anaconda3/envs/deepspeed/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 293, in init self._configure_optimizer(optimizer, model_parameters) File "/home/bduser/anaconda3/envs/deepspeed/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 1093, in _configure_optimizer basic_optimizer = self._configure_basic_optimizer(model_parameters) File "/home/bduser/anaconda3/envs/deepspeed/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 1193, in _configure_basic_optimizer optimizer = FusedLamb(model_parameters, **optimizer_parameters) File "/home/bduser/anaconda3/envs/deepspeed/lib/python3.8/site-packages/deepspeed/ops/lamb/fused_lamb.py", line 51, in init self.fused_lamb_cuda = FusedLambBuilder().load() File "/home/bduser/anaconda3/envs/deepspeed/lib/python3.8/site-packages/deepspeed/ops/op_builder/builder.py", line 367, in load return self.jit_load(verbose) File "/home/bduser/anaconda3/envs/deepspeed/lib/python3.8/site-packages/deepspeed/ops/op_builder/builder.py", line 399, in jit_load op_module = load( File "/home/bduser/anaconda3/envs/deepspeed/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1125, in load return _jit_compile( File "/home/bduser/anaconda3/envs/deepspeed/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1338, in _jit_compile _write_ninja_file_and_build_library( File "/home/bduser/anaconda3/envs/deepspeed/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1450, in _write_ninja_file_and_build_library _run_ninja_build( File "/home/bduser/anaconda3/envs/deepspeed/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1734, in _run_ninja_build raise RuntimeError(message) from e RuntimeError: Error building extension 'fused_lamb'
@jeyblu, apologies for the delayed response. Is this still a problem?
Yes it's still a problem. Thanks.
Can you please share the output of running ds_report
in your shell?
DeepSpeed C++/CUDA extension op report
NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.
JIT compiled ops requires ninja ninja .................. [OKAY]
op name ................ installed .. compatible
cpu_adam ............... [NO] ....... [OKAY] cpu_adagrad ............ [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] async_io ............... [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] utils .................. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY]
DeepSpeed general environment info: torch install path ............... ['/home/bduser/anaconda3/envs/deepspeed/lib/python3.8/site-packages/torch'] torch version .................... 1.11.0a0+gitb46c89d torch cuda version ............... 11.4 nvcc version ..................... 11.4 deepspeed install path ........... ['/home/bduser/anaconda3/envs/deepspeed/lib/python3.8/site-packages/deepspeed'] deepspeed info ................... 0.5.8+8220674, 8220674, master deepspeed wheel compiled w. ...... torch 1.11, cuda 11.4