llama-recipes
llama-recipes copied to clipboard
undefined symbol: cget_col_row_stats
System Info
torch=2.0.1+cu118
NVIDIA TITAN RTX 3090
NVIDIA-SMI 525.116.04 Driver Version: 525.116.04 CUDA Version: 12.0
Information
- [X] The official example scripts
- [ ] My own modified scripts
🐛 Describe the bug
(llama) xhuang@4210GPU:~/PycharmProject/llama-recipes$ python llama_finetuning.py --use_peft --peft_method lora --model_name /home2/xhuang/PycharmProject/llama-recipes/output /llama2_7B_hf/ --output_dir /home2/xhuang/PycharmProject/llama-recipes/output/PEFT/demo_model
Error logs
(llama) xhuang@4210GPU:~/PycharmProject/llama-recipes$ python llama_finetuning.py --use_peft --peft_method lora --model_name /home2/xhuang/PycharmProject/llama-recipes/output /llama2_7B_hf/ --output_dir /home2/xhuang/PycharmProject/llama-recipes/output/PEFT/demo_model
===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run
python -m bitsandbytes
and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
bin /home2/xhuang/.conda/envs/llama/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cpu.so
/home2/xhuang/.conda/envs/llama/lib/python3.9/site-packages/bitsandbytes/cextension.py:34: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
warn("The installed version of bitsandbytes was compiled without GPU support. "
/home2/xhuang/.conda/envs/llama/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cadam32bit_grad_fp32
/home2/xhuang/.conda/envs/llama/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /home2/xhuang/.conda/envs/llama did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
warn(msg)
/home2/xhuang/.conda/envs/llama/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/home2/xhuang/stanford-corenlp/*'), PosixPath('/home2/xhuang/*')}
warn(msg)
/home2/xhuang/.conda/envs/llama/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('http'), PosixPath('7890'), PosixPath('//114.212.83.107')}
warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
/home2/xhuang/.conda/envs/llama/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/cuda/lib64')}
warn(msg)
ERROR: python: undefined symbol: cudaRuntimeGetVersion
CUDA SETUP: libcudart.so path is None
CUDA SETUP: Is seems that your cuda installation is not in your path. See https://github.com/TimDettmers/bitsandbytes/issues/85 for more information.
CUDA SETUP: CUDA version lower than 11 are currently not supported for LLM.int8(). You will be only to use 8-bit optimizers and quantization routines!!
/home2/xhuang/.conda/envs/llama/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: No libcudart.so found! Install CUDA or the cudatoolkit p ackage (anaconda)!
warn(msg)
CUDA SETUP: Highest compute capability among GPUs detected: 8.6
CUDA SETUP: Detected CUDA version 00
CUDA SETUP: Loading binary /home2/xhuang/.conda/envs/llama/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cpu.so...
/home2/xhuang/PycharmProject/llama-recipes/llama_finetuning.py:11: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_reso urces.html
from pkg_resources import packaging
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:12<00:00, 6.12s/it]
--> Model /home2/xhuang/PycharmProject/llama-recipes/output/llama2_7B_hf/
--> /home2/xhuang/PycharmProject/llama-recipes/output/llama2_7B_hf/ has 6738.415616 Million params
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. If you see this, DO NOT PANIC! This is expected, and s imply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=True`. This should only be set if you understand what it means, and thouroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
trainable params: 4,194,304 || all params: 6,742,609,920 || trainable%: 0.06220594176090199
Traceback (most recent call last):
File "/home2/xhuang/PycharmProject/llama-recipes/llama_finetuning.py", line 250, in <module>
fire.Fire(main)
File "/home2/xhuang/.conda/envs/llama/lib/python3.9/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/home2/xhuang/.conda/envs/llama/lib/python3.9/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/home2/xhuang/.conda/envs/llama/lib/python3.9/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/home2/xhuang/PycharmProject/llama-recipes/llama_finetuning.py", line 155, in main
model.to("cuda")
File "/home2/xhuang/.conda/envs/llama/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1145, in to
return self._apply(convert)
File "/home2/xhuang/.conda/envs/llama/lib/python3.9/site-packages/torch/nn/modules/module.py", line 797, in _apply
module._apply(fn)
File "/home2/xhuang/.conda/envs/llama/lib/python3.9/site-packages/torch/nn/modules/module.py", line 797, in _apply
module._apply(fn)
File "/home2/xhuang/.conda/envs/llama/lib/python3.9/site-packages/torch/nn/modules/module.py", line 797, in _apply
module._apply(fn)
[Previous line repeated 4 more times]
File "/home2/xhuang/.conda/envs/llama/lib/python3.9/site-packages/torch/nn/modules/module.py", line 820, in _apply
param_applied = fn(param)
File "/home2/xhuang/.conda/envs/llama/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1143, in convert
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 172.00 MiB (GPU 0; 23.69 GiB total capacity; 22.84 GiB already allocated; 72.94 MiB free; 22.84 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA _ALLOC_CONF
(llama) xhuang@4210GPU:~/PycharmProject/llama-recipes$ ^C
(llama) xhuang@4210GPU:~/PycharmProject/llama-recipes$ python llama_finetuning.py --use_peft --peft_method lora --quantization --model_name /home2/xhuang/PycharmProject/llama-recipes/output/llama2_7B _hf/ --output_dir /home2/xhuang/PycharmProject/llama-recipes/output/PEFT/demo_model
===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run
python -m bitsandbytes
and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
bin /home2/xhuang/.conda/envs/llama/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cpu.so
/home2/xhuang/.conda/envs/llama/lib/python3.9/site-packages/bitsandbytes/cextension.py:34: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
warn("The installed version of bitsandbytes was compiled without GPU support. "
/home2/xhuang/.conda/envs/llama/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cadam32bit_grad_fp32
/home2/xhuang/.conda/envs/llama/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /home2/xhuang/.conda/envs/llama did not contain ['libcudart.so', 'libcudart.so.11.0', 'lib cudart.so.12.0'] as expected! Searching further paths...
warn(msg)
/home2/xhuang/.conda/envs/llama/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {Pos ixPath('/home2/xhuang/*'), PosixPath('/home2/xhuang/stanford-corenlp/*')}
warn(msg)
/home2/xhuang/.conda/envs/llama/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {Pos ixPath('http'), PosixPath('//114.212.83.107'), PosixPath('7890')}
warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
/home2/xhuang/.conda/envs/llama/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {Pos ixPath('/usr/local/cuda/lib64')}
warn(msg)
ERROR: python: undefined symbol: cudaRuntimeGetVersion
CUDA SETUP: libcudart.so path is None
CUDA SETUP: Is seems that your cuda installation is not in your path. See https://github.com/TimDettmers/bitsandbytes/issues/85 for more information.
CUDA SETUP: CUDA version lower than 11 are currently not supported for LLM.int8(). You will be only to use 8-bit optimizers and quantization routines!!
/home2/xhuang/.conda/envs/llama/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: No libcudart.so found! Install CUDA or the cudatoolkit package (anaconda)!
warn(msg)
CUDA SETUP: Highest compute capability among GPUs detected: 8.6
CUDA SETUP: Detected CUDA version 00
CUDA SETUP: Loading binary /home2/xhuang/.conda/envs/llama/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cpu.so...
/home2/xhuang/PycharmProject/llama-recipes/llama_finetuning.py:11: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
from pkg_resources import packaging
Loading checkpoint shards: 0%| | 0/2 [00:05<?, ?it/s]
Traceback (most recent call last):
File "/home2/xhuang/PycharmProject/llama-recipes/llama_finetuning.py", line 250, in <module>
fire.Fire(main)
File "/home2/xhuang/.conda/envs/llama/lib/python3.9/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/home2/xhuang/.conda/envs/llama/lib/python3.9/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/home2/xhuang/.conda/envs/llama/lib/python3.9/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/home2/xhuang/PycharmProject/llama-recipes/llama_finetuning.py", line 93, in main
model = LlamaForCausalLM.from_pretrained(
File "/home2/xhuang/.conda/envs/llama/lib/python3.9/site-packages/transformers/modeling_utils.py", line 3091, in from_pretrained
) = cls._load_pretrained_model(
File "/home2/xhuang/.conda/envs/llama/lib/python3.9/site-packages/transformers/modeling_utils.py", line 3471, in _load_pretrained_model
new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
File "/home2/xhuang/.conda/envs/llama/lib/python3.9/site-packages/transformers/modeling_utils.py", line 744, in _load_state_dict_into_meta_model
set_module_quantized_tensor_to_device(
File "/home2/xhuang/.conda/envs/llama/lib/python3.9/site-packages/transformers/utils/bitsandbytes.py", line 97, in set_module_quantized_tensor_to_device
new_value = bnb.nn.Int8Params(new_value, requires_grad=False, **kwargs).to(device)
File "/home2/xhuang/.conda/envs/llama/lib/python3.9/site-packages/bitsandbytes/nn/modules.py", line 294, in to
return self.cuda(device)
File "/home2/xhuang/.conda/envs/llama/lib/python3.9/site-packages/bitsandbytes/nn/modules.py", line 258, in cuda
CB, CBt, SCB, SCBt, coo_tensorB = bnb.functional.double_quant(B)
File "/home2/xhuang/.conda/envs/llama/lib/python3.9/site-packages/bitsandbytes/functional.py", line 1987, in double_quant
row_stats, col_stats, nnz_row_ptr = get_colrow_absmax(
File "/home2/xhuang/.conda/envs/llama/lib/python3.9/site-packages/bitsandbytes/functional.py", line 1876, in get_colrow_absmax
lib.cget_col_row_stats(ptrA, ptrRowStats, ptrColStats, ptrNnzrows, ct.c_float(threshold), rows, cols)
File "/home2/xhuang/.conda/envs/llama/lib/python3.9/ctypes/__init__.py", line 395, in __getattr__
func = self.__getitem__(name)
File "/home2/xhuang/.conda/envs/llama/lib/python3.9/ctypes/__init__.py", line 400, in __getitem__
func = self._FuncPtr((name_or_ordinal, self))
AttributeError: /home2/xhuang/.conda/envs/llama/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cget_col_row_stats
Expected behavior
can not run official example