ParlAI icon indicating copy to clipboard operation
ParlAI copied to clipboard

Unable to load ngram blocking on GPU

Open pygongnlp opened this issue 3 years ago • 1 comments

I have a question about how to train RAG model

with the following code parlai train_model -m rag -t wizard_of_wikipedia -mf rag --batchsize 16 --fp16 True --gradient-clip 0.1 --label-truncate 128 --log-every-n-secs 30 --lr-scheduler reduceonplateau --lr-scheduler-patience 1 --model-parallel True --optimizer adam --text-truncate 512 --truncate 512 -lr 1e-05 -vmm min -veps 0.25 -vme 1000 -vmt ppl -vp 5 -o arch/bart_large and with the following warning Unable to load ngram blocking on GPU: Error building extension 'ngram_repeat_block_cuda': [1/2] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=ngram_repeat_block_cuda -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/yhli/anaconda3/envs/py38/lib/python3.8/site-packages/torch-1.12.1-py3.8-linux-x86_64.egg/torch/include -isystem /home/yhli/anaconda3/envs/py38/lib/python3.8/site-packages/torch-1.12.1-py3.8-linux-x86_64.egg/torch/include/torch/csrc/api/include -isystem /home/yhli/anaconda3/envs/py38/lib/python3.8/site-packages/torch-1.12.1-py3.8-linux-x86_64.egg/torch/include/TH -isystem /home/yhli/anaconda3/envs/py38/lib/python3.8/site-packages/torch-1.12.1-py3.8-linux-x86_64.egg/torch/include/THC -isystem /usr/local/cuda/include -isystem /home/yhli/anaconda3/envs/py38/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_61,code=compute_61 -gencode=arch=compute_61,code=sm_61 --compiler-options '-fPIC' -std=c++14 -c /home/yhli/tmp/ParlAI-main/parlai/clib/cuda/ngram_repeat_block_cuda_kernel.cu -o ngram_repeat_block_cuda_kernel.cuda.o FAILED: ngram_repeat_block_cuda_kernel.cuda.o

If anyone can show me how to solve about it Thanks a lot

pygongnlp avatar Nov 05 '22 12:11 pygongnlp

is this a warning, or does your train script exit unsuccessfully?

klshuster avatar Nov 07 '22 15:11 klshuster

In our case it's a warning

drevicko avatar Nov 24 '22 05:11 drevicko

Ok, it should not affect any of your training scripts if it's just a warning

klshuster avatar Nov 29 '22 20:11 klshuster

This issue has not had activity in 30 days. Please feel free to reopen if you have more issues. You may apply the "never-stale" tag to prevent this from happening.

github-actions[bot] avatar Dec 30 '22 00:12 github-actions[bot]

Unable to load ngram blocking on GPU: Error building extension 'ngram_repeat_block_cuda': [1/2] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=ngram_repeat_block_cuda -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/yhli/anaconda3/envs/py38/lib/python3.8/site-packages/torch-1.12.1-py3.8-linux-x86_64.egg/torch/include -isystem /home/yhli/anaconda3/envs/py38/lib/python3.8/site-packages/torch-1.12.1-py3.8-linux-x86_64.egg/torch/include/torch/csrc/api/include -isystem /home/yhli/anaconda3/envs/py38/lib/python3.8/site-packages/torch-1.12.1-py3.8-linux-x86_64.egg/torch/include/TH -isystem /home/yhli/anaconda3/envs/py38/lib/python3.8/site-packages/torch-1.12.1-py3.8-linux-x86_64.egg/torch/include/THC -isystem /usr/local/cuda/include -isystem /home/yhli/anaconda3/envs/py38/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_61,code=compute_61 -gencode=arch=compute_61,code=sm_61 --compiler-options '-fPIC' -std=c++14 -c /home/yhli/tmp/ParlAI-main/parlai/clib/cuda/ngram_repeat_block_cuda_kernel.cu -o ngram_repeat_block_cuda_kernel.cuda.o FAILED: ngram_repeat_block_cuda_kernel.cuda.o

For my part, I downgrade python from 3.9 to 3.8 and solved this problem.

ln -sf /usr/bin/python3.8 /usr/bin/python3 ln -sf /usr/bin/pip3.8 /usr/bin/pip3

amrta-coder avatar Apr 25 '23 23:04 amrta-coder