opencompass
opencompass copied to clipboard
[Bug] RuntimeError: "LayerNormKernelImpl" not implemented for 'Half'
Prerequisite
- [X] I have searched Issues and Discussions but cannot get the expected help.
- [X] The bug has not been fixed in the latest version.
Type
I'm evaluating with the officially supported tasks/models/datasets.
Environment
{'CUDA available': True,
'CUDA_HOME': '/usr/local/cuda',
'GCC': 'gcc (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609',
'GPU 0,1,2,3': 'NVIDIA RTX A6000',
'GPU 4,5,6,7': 'NVIDIA GeForce RTX 2080 Ti',
'MMEngine': '0.10.2',
'NVCC': 'Cuda compilation tools, release 10.0, V10.0.13',
'OpenCV': '4.8.1',
'PyTorch': '1.13.1+cu117',
'PyTorch compiling details': 'PyTorch built with:\n'
' - GCC 9.3\n'
' - C++ Version: 201402\n'
' - Intel(R) Math Kernel Library Version '
'2020.0.0 Product Build 20191122 for Intel(R) 64 '
'architecture applications\n'
' - Intel(R) MKL-DNN v2.6.0 (Git Hash '
'52b5f107dd9cf10910aaa19cb47f3abf9b349815)\n'
' - OpenMP 201511 (a.k.a. OpenMP 4.5)\n'
' - LAPACK is enabled (usually provided by '
'MKL)\n'
' - NNPACK is enabled\n'
' - CPU capability usage: AVX2\n'
' - CUDA Runtime 11.7\n'
' - NVCC architecture flags: '
'-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_
50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-ge
ncode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=comput
e_86,code=sm_86\n'
' - CuDNN 8.5\n'
' - Magma 2.6.1\n'
' - Build settings: BLAS_INFO=mkl, '
'BUILD_TYPE=Release, CUDA_VERSION=11.7, '
'CUDNN_VERSION=8.5.0, '
'CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, '
'CXX_FLAGS= -fabi-version=11 -Wno-deprecated '
'-fvisibility-inlines-hidden -DUSE_PTHREADPOOL '
'-fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM '
'-DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK '
'-DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE '
'-DEDGE_PROFILER_USE_KINETO -O2 -fPIC '
'-Wno-narrowing -Wall -Wextra '
'-Werror=return-type -Werror=non-virtual-dtor '
'-Wno-missing-field-initializers '
'-Wno-type-limits -Wno-array-bounds '
'-Wno-unknown-pragmas -Wunused-local-typedefs '
'-Wno-unused-parameter -Wno-unused-function '
'-Wno-unused-result -Wno-strict-overflow '
'-Wno-strict-aliasing '
'-Wno-error=deprecated-declarations '
'-Wno-stringop-overflow -Wno-psabi '
'-Wno-error=pedantic -Wno-error=redundant-decls '
'-Wno-error=old-style-cast '
'-fdiagnostics-color=always -faligned-new '
'-Wno-unused-but-set-variable '
'-Wno-maybe-uninitialized -fno-math-errno '
'-fno-trapping-math -Werror=format '
'-Werror=cast-function-type '
'-Wno-stringop-overflow, LAPACK_INFO=mkl, '
'PERF_WITH_AVX=1, PERF_WITH_AVX2=1, '
'PERF_WITH_AVX512=1, TORCH_VERSION=1.13.1, '
'USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, '
'USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, '
'USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, '
'USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, \n',
'Python': '3.10.13 (main, Sep 11 2023, 13:44:35) [GCC 11.2.0]',
'TorchVision': '0.14.1+cu117',
'numpy_random_seed': 2147483648,
'opencompass': '0.2.0+',
'sys.platform': 'linux'}
Reproduces the problem - code/configuration sample
model_path="./opt125m/"
python run.py --datasets siqa_gen \
--hf-path ${model_path} \
--tokenizer-path ${model_path} \
--model-kwargs trust_remote_code=True \
--tokenizer-kwargs trust_remote_code=True \
--max-out-len 100 \
--max-seq-len 2048 \
--batch-size 8 \
--no-batch-padding \
--num-gpus 1
Reproduces the problem - command or script
model_path="./opt125m/"
python run.py --datasets siqa_gen \
--hf-path ${model_path} \
--tokenizer-path ${model_path} \
--model-kwargs trust_remote_code=True \
--tokenizer-kwargs trust_remote_code=True \
--max-out-len 100 \
--max-seq-len 2048 \
--batch-size 8 \
--no-batch-padding \
--num-gpus 1
Reproduces the problem - error message
12/30 17:30:51 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_hugginface_opt125m/siqa]
12/30 17:30:54 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_hugginface_opt125m/siqa]
[2023-12-30 17:30:55,647] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
0%| | 0/245 [00:00<?, ?it/s]
0%| | 0/245 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/data1/tangyh/opencompass/opencompass/tasks/openicl_infer.py", line 148, in <module>
inferencer.run()
File "/data1/tangyh/opencompass/opencompass/tasks/openicl_infer.py", line 78, in run
self._inference()
File "/data1/tangyh/opencompass/opencompass/tasks/openicl_infer.py", line 121, in _inference
inferencer.inference(retriever,
File "/data1/tangyh/opencompass/opencompass/openicl/icl_inferencer/icl_gen_inferencer.py", line 140, in inference
results = self.model.generate_from_template(
File "/data1/tangyh/opencompass/opencompass/models/base.py", line 141, in generate_from_template
return self.generate(inputs, max_out_len=max_out_len, **kwargs)
File "/data1/tangyh/opencompass/opencompass/models/huggingface.py", line 248, in generate
return sum(
File "/data1/tangyh/opencompass/opencompass/models/huggingface.py", line 249, in <genexpr>
(self._single_generate(inputs=[input_],
File "/data1/tangyh/opencompass/opencompass/models/huggingface.py", line 376, in _single_generate
outputs = self.model.generate(input_ids=input_ids,
File "/data1/tangyh/.envs/opencompass/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/data1/tangyh/.envs/opencompass/lib/python3.10/site-packages/transformers/generation/utils.py", line 1718, in generate
return self.greedy_search(
File "/data1/tangyh/.envs/opencompass/lib/python3.10/site-packages/transformers/generation/utils.py", line 2579, in greedy_search
outputs = self(
File "/data1/tangyh/.envs/opencompass/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/data1/tangyh/.envs/opencompass/lib/python3.10/site-packages/transformers/models/opt/modeling_opt.py", line 1143, in forward
outputs = self.model.decoder(
File "/data1/tangyh/.envs/opencompass/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/data1/tangyh/.envs/opencompass/lib/python3.10/site-packages/transformers/models/opt/modeling_opt.py", line 909, in forward
layer_outputs = decoder_layer(
File "/data1/tangyh/.envs/opencompass/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/data1/tangyh/.envs/opencompass/lib/python3.10/site-packages/transformers/models/opt/modeling_opt.py", line 547, in forward
hidden_states = self.self_attn_layer_norm(hidden_states)
File "/data1/tangyh/.envs/opencompass/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/data1/tangyh/.envs/opencompass/lib/python3.10/site-packages/torch/nn/modules/normalization.py", line 190, in forward
return F.layer_norm(
File "/data1/tangyh/.envs/opencompass/lib/python3.10/site-packages/torch/nn/functional.py", line 2515, in layer_norm
return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: "LayerNormKernelImpl" not implemented for 'Half'
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 25937) of binary: /data1/tangyh/.envs/opencompass/bin/python
Traceback (most recent call last):
File "/data1/tangyh/.envs/opencompass/bin/torchrun", line 8, in <module>
sys.exit(main())
File "/data1/tangyh/.envs/opencompass/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
return f(*args, **kwargs)
File "/data1/tangyh/.envs/opencompass/lib/python3.10/site-packages/torch/distributed/run.py", line 762, in main
run(args)
File "/data1/tangyh/.envs/opencompass/lib/python3.10/site-packages/torch/distributed/run.py", line 753, in run
elastic_launch(
File "/data1/tangyh/.envs/opencompass/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 132, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/data1/tangyh/.envs/opencompass/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
/data1/tangyh/opencompass/opencompass/tasks/openicl_infer.py FAILED
------------------------------------------------------------
Failures:
<NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
time : 2023-12-30_17:30:59
host : 2080ti-2
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 25937)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================
Other information
- I have tried with different torch versions (1.13 and 2.10) and different GPUs(2080ti and 3090ti), but the same error is raised;
torch.cuda.is_availableconsistently returnsTrue;- I try huggingface
transformersandtorchoutsideopencompassand they work well withopt125m. So I think the problem is caused byopencompassinstead of my environment; - My machine can not connect to huggingface so I use the local huggingface model (
opt125m) I downloaded preciously. It works fine outside ofopencompass, just like the previous line says. - I just downloaded and installed
opencompasstoday so it is definitely the latest version.
I saw a similar bug reported in issue #630 and tried with the advice provided in the issue but that does NOT help.
I have been stuck by this issue for two days and any help will be appreciated.
I solve this problem just now. device_map should be set as auto otherwise the inference would be started on CPU. However, the 'half' precision is only implemented for GPU. Therefore, the following shell script works:
model_path="./opt125m/"
python run.py --datasets siqa_gen \
--hf-path ${model_path} \
--tokenizer-path ${model_path} \
--model-kwargs device_map='auto' trust_remote_code=True \
--tokenizer-kwargs trust_remote_code=True \
--max-out-len 100 \
--max-seq-len 2048 \
--batch-size 8 \
--no-batch-padding \
--num-gpus 1
Thanks for your solution, feel free to reopen it if needed