lightseq icon indicating copy to clipboard operation
lightseq copied to clipboard

identifier "__hisnan" is undefined

Open jimmieliu opened this issue 1 year ago • 4 comments

Hi,

Env: cuda 11.6, pytorch 1.11 Installed with pip install lightseq

Then test.py: from lightseq.training.ops.pytorch.quant_linear_layer import LSQuantLinearLayer

When runing test.py, the following error happens.

It seems to me a simple issue, but dunno where to import the __hisnan function.

Thank you

83 errors detected in the compilation of "/opt/conda/lib/python3.8/site-packages/lightseq/csrc/kernels/dropout_kernels.cu". [7/8] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=lightseq_layers_new -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1013" -I/opt/conda/lib/python3.8/site-packages/lightseq/csrc/kernels/includes -I/opt/conda/lib/python3.8/site-packages/lightseq/csrc/ops_new/includes -I/opt/conda/lib/python3.8/site-packages/lightseq/csrc/lsflow/includes -I/opt/conda/lib/python3.8/site-packages/lightseq/csrc/layers_new/includes -isystem /opt/conda/lib/python3.8/site-packages/torch/include -isystem /opt/conda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.8/site-packages/torch/include/TH -isystem /opt/conda/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=1 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_52,code=sm_52 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -DTHRUST_IGNORE_CUB_VERSION_CHECK -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -c /opt/conda/lib/python3.8/site-packages/lightseq/csrc/kernels/cuda_util.cu -o cuda_util.cuda.o FAILED: cuda_util.cuda.o /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=lightseq_layers_new -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1013" -I/opt/conda/lib/python3.8/site-packages/lightseq/csrc/kernels/includes -I/opt/conda/lib/python3.8/site-packages/lightseq/csrc/ops_new/includes -I/opt/conda/lib/python3.8/site-packages/lightseq/csrc/lsflow/includes -I/opt/conda/lib/python3.8/site-packages/lightseq/csrc/layers_new/includes -isystem /opt/conda/lib/python3.8/site-packages/torch/include -isystem /opt/conda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.8/site-packages/torch/include/TH -isystem /opt/conda/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=1 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_52,code=sm_52 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -DTHRUST_IGNORE_CUB_VERSION_CHECK -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -c /opt/conda/lib/python3.8/site-packages/lightseq/csrc/kernels/cuda_util.cu -o cuda_util.cuda.o /opt/conda/lib/python3.8/site-packages/lightseq/csrc/kernels/cuda_util.cu(218): error: identifier "__hisnan" is undefined

1 error detected in the compilation of "/opt/conda/lib/python3.8/site-packages/lightseq/csrc/kernels/cuda_util.cu". ninja: build stopped: subcommand failed. Traceback (most recent call last): File "/opt/conda/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1726, in _run_ninja_build subprocess.run( File "/opt/conda/lib/python3.8/subprocess.py", line 516, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "test.py", line 4, in from lightseq.training.ops.pytorch.quant_linear_layer import LSQuantLinearLayer File "/opt/conda/lib/python3.8/site-packages/lightseq/training/init.py", line 1, in from lightseq.training.ops.pytorch.transformer_embedding_layer import ( File "/opt/conda/lib/python3.8/site-packages/lightseq/training/ops/pytorch/init.py", line 11, in layer_cuda_module = LayerBuilder().load() File "/opt/conda/lib/python3.8/site-packages/lightseq/training/ops/pytorch/builder/builder.py", line 203, in load return self.jit_load(verbose) File "/opt/conda/lib/python3.8/site-packages/lightseq/training/ops/pytorch/builder/builder.py", line 231, in jit_load op_module = load( File "/opt/conda/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1130, in load return _jit_compile( File "/opt/conda/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1343, in _jit_compile _write_ninja_file_and_build_library( File "/opt/conda/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1455, in _write_ninja_file_and_build_library _run_ninja_build( File "/opt/conda/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1742, in _run_ninja_build raise RuntimeError(message) from e RuntimeError: Error building extension 'lightseq_layers_new'

jimmieliu avatar Nov 09 '23 10:11 jimmieliu

Just set env TORCH_CUDA_ARCH_LIST to null.

export TORCH_CUDA_ARCH_LIST=

Torch cpp_extension will internally add cuda_flags according to TORCH_CUDA_ARCH_LIST to ninja. You got these errors because somehow (when you installed some other libraries) your env variable TORCH_CUDA_ARCH_LIST are expanded to all possible archs. This caused nvcc compiler forced to disable half precision operations for old arch compatibility.

helson73 avatar Jan 15 '24 04:01 helson73

您的邮件已收到!谢谢

Anychnn avatar Jan 15 '24 04:01 Anychnn

export TORCH_CUDA_ARCH_LIST=

不管用!

runningabcd avatar Apr 15 '24 03:04 runningabcd

您的邮件已收到!谢谢

Anychnn avatar Apr 15 '24 03:04 Anychnn