llama-recipes icon indicating copy to clipboard operation
llama-recipes copied to clipboard

undefined symbol: cget_col_row_stats

Open cdhx opened this issue 1 year ago • 3 comments

System Info

torch=2.0.1+cu118
NVIDIA TITAN RTX  3090
NVIDIA-SMI 525.116.04   Driver Version: 525.116.04   CUDA Version: 12.0

Information

  • [X] The official example scripts
  • [ ] My own modified scripts

🐛 Describe the bug

(llama) xhuang@4210GPU:~/PycharmProject/llama-recipes$ python llama_finetuning.py  --use_peft --peft_method lora --model_name /home2/xhuang/PycharmProject/llama-recipes/output                                                          /llama2_7B_hf/ --output_dir /home2/xhuang/PycharmProject/llama-recipes/output/PEFT/demo_model

Error logs


(llama) xhuang@4210GPU:~/PycharmProject/llama-recipes$ python llama_finetuning.py  --use_peft --peft_method lora --model_name /home2/xhuang/PycharmProject/llama-recipes/output                                                          /llama2_7B_hf/ --output_dir /home2/xhuang/PycharmProject/llama-recipes/output/PEFT/demo_model

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
bin /home2/xhuang/.conda/envs/llama/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cpu.so
/home2/xhuang/.conda/envs/llama/lib/python3.9/site-packages/bitsandbytes/cextension.py:34: UserWarning: The installed version of bitsandbytes was compiled without GPU support.                                                           8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
  warn("The installed version of bitsandbytes was compiled without GPU support. "
/home2/xhuang/.conda/envs/llama/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cadam32bit_grad_fp32
/home2/xhuang/.conda/envs/llama/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /home2/xhuang/.conda/envs/llama did not contain ['libcudart.so',                                                           'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
/home2/xhuang/.conda/envs/llama/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found                                                           to be non-existent: {PosixPath('/home2/xhuang/stanford-corenlp/*'), PosixPath('/home2/xhuang/*')}
  warn(msg)
/home2/xhuang/.conda/envs/llama/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found                                                           to be non-existent: {PosixPath('http'), PosixPath('7890'), PosixPath('//114.212.83.107')}
  warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
/home2/xhuang/.conda/envs/llama/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found                                                           to be non-existent: {PosixPath('/usr/local/cuda/lib64')}
  warn(msg)
ERROR: python: undefined symbol: cudaRuntimeGetVersion
CUDA SETUP: libcudart.so path is None
CUDA SETUP: Is seems that your cuda installation is not in your path. See https://github.com/TimDettmers/bitsandbytes/issues/85 for more information.
CUDA SETUP: CUDA version lower than 11 are currently not supported for LLM.int8(). You will be only to use 8-bit optimizers and quantization routines!!
/home2/xhuang/.conda/envs/llama/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: No libcudart.so found! Install CUDA or the cudatoolkit p                                                          ackage (anaconda)!
  warn(msg)
CUDA SETUP: Highest compute capability among GPUs detected: 8.6
CUDA SETUP: Detected CUDA version 00
CUDA SETUP: Loading binary /home2/xhuang/.conda/envs/llama/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cpu.so...
/home2/xhuang/PycharmProject/llama-recipes/llama_finetuning.py:11: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_reso                                                          urces.html
  from pkg_resources import packaging
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:12<00:00,  6.12s/it]
--> Model /home2/xhuang/PycharmProject/llama-recipes/output/llama2_7B_hf/

--> /home2/xhuang/PycharmProject/llama-recipes/output/llama2_7B_hf/ has 6738.415616 Million params

You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. If you see this, DO NOT PANIC! This is expected, and s                                                          imply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=True`. This should only be set if                                                           you understand what it means, and thouroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
trainable params: 4,194,304 || all params: 6,742,609,920 || trainable%: 0.06220594176090199
Traceback (most recent call last):
  File "/home2/xhuang/PycharmProject/llama-recipes/llama_finetuning.py", line 250, in <module>
    fire.Fire(main)
  File "/home2/xhuang/.conda/envs/llama/lib/python3.9/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/home2/xhuang/.conda/envs/llama/lib/python3.9/site-packages/fire/core.py", line 475, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/home2/xhuang/.conda/envs/llama/lib/python3.9/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/home2/xhuang/PycharmProject/llama-recipes/llama_finetuning.py", line 155, in main
    model.to("cuda")
  File "/home2/xhuang/.conda/envs/llama/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1145, in to
    return self._apply(convert)
  File "/home2/xhuang/.conda/envs/llama/lib/python3.9/site-packages/torch/nn/modules/module.py", line 797, in _apply
    module._apply(fn)
  File "/home2/xhuang/.conda/envs/llama/lib/python3.9/site-packages/torch/nn/modules/module.py", line 797, in _apply
    module._apply(fn)
  File "/home2/xhuang/.conda/envs/llama/lib/python3.9/site-packages/torch/nn/modules/module.py", line 797, in _apply
    module._apply(fn)
  [Previous line repeated 4 more times]
  File "/home2/xhuang/.conda/envs/llama/lib/python3.9/site-packages/torch/nn/modules/module.py", line 820, in _apply
    param_applied = fn(param)
  File "/home2/xhuang/.conda/envs/llama/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1143, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 172.00 MiB (GPU 0; 23.69 GiB total capacity; 22.84 GiB already allocated; 72.94 MiB free; 22.84 GiB reserved                                                           in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA                                                          _ALLOC_CONF
(llama) xhuang@4210GPU:~/PycharmProject/llama-recipes$ ^C
(llama) xhuang@4210GPU:~/PycharmProject/llama-recipes$ python llama_finetuning.py  --use_peft --peft_method lora --quantization --model_name /home2/xhuang/PycharmProject/llama-recipes/output/llama2_7B                                 _hf/ --output_dir /home2/xhuang/PycharmProject/llama-recipes/output/PEFT/demo_model

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
bin /home2/xhuang/.conda/envs/llama/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cpu.so
/home2/xhuang/.conda/envs/llama/lib/python3.9/site-packages/bitsandbytes/cextension.py:34: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit                                  multiplication, and GPU quantization are unavailable.
  warn("The installed version of bitsandbytes was compiled without GPU support. "
/home2/xhuang/.conda/envs/llama/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cadam32bit_grad_fp32
/home2/xhuang/.conda/envs/llama/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /home2/xhuang/.conda/envs/llama did not contain ['libcudart.so', 'libcudart.so.11.0', 'lib                                 cudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
/home2/xhuang/.conda/envs/llama/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {Pos                                 ixPath('/home2/xhuang/*'), PosixPath('/home2/xhuang/stanford-corenlp/*')}
  warn(msg)
/home2/xhuang/.conda/envs/llama/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {Pos                                 ixPath('http'), PosixPath('//114.212.83.107'), PosixPath('7890')}
  warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
/home2/xhuang/.conda/envs/llama/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {Pos                                 ixPath('/usr/local/cuda/lib64')}
  warn(msg)
ERROR: python: undefined symbol: cudaRuntimeGetVersion
CUDA SETUP: libcudart.so path is None
CUDA SETUP: Is seems that your cuda installation is not in your path. See https://github.com/TimDettmers/bitsandbytes/issues/85 for more information.
CUDA SETUP: CUDA version lower than 11 are currently not supported for LLM.int8(). You will be only to use 8-bit optimizers and quantization routines!!
/home2/xhuang/.conda/envs/llama/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: No libcudart.so found! Install CUDA or the cudatoolkit package (anaconda)!
  warn(msg)
CUDA SETUP: Highest compute capability among GPUs detected: 8.6
CUDA SETUP: Detected CUDA version 00
CUDA SETUP: Loading binary /home2/xhuang/.conda/envs/llama/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cpu.so...
/home2/xhuang/PycharmProject/llama-recipes/llama_finetuning.py:11: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
  from pkg_resources import packaging
Loading checkpoint shards:   0%|                                                                                                                                                  | 0/2 [00:05<?, ?it/s]
Traceback (most recent call last):
  File "/home2/xhuang/PycharmProject/llama-recipes/llama_finetuning.py", line 250, in <module>
    fire.Fire(main)
  File "/home2/xhuang/.conda/envs/llama/lib/python3.9/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/home2/xhuang/.conda/envs/llama/lib/python3.9/site-packages/fire/core.py", line 475, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/home2/xhuang/.conda/envs/llama/lib/python3.9/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/home2/xhuang/PycharmProject/llama-recipes/llama_finetuning.py", line 93, in main
    model = LlamaForCausalLM.from_pretrained(
  File "/home2/xhuang/.conda/envs/llama/lib/python3.9/site-packages/transformers/modeling_utils.py", line 3091, in from_pretrained
    ) = cls._load_pretrained_model(
  File "/home2/xhuang/.conda/envs/llama/lib/python3.9/site-packages/transformers/modeling_utils.py", line 3471, in _load_pretrained_model
    new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
  File "/home2/xhuang/.conda/envs/llama/lib/python3.9/site-packages/transformers/modeling_utils.py", line 744, in _load_state_dict_into_meta_model
    set_module_quantized_tensor_to_device(
  File "/home2/xhuang/.conda/envs/llama/lib/python3.9/site-packages/transformers/utils/bitsandbytes.py", line 97, in set_module_quantized_tensor_to_device
    new_value = bnb.nn.Int8Params(new_value, requires_grad=False, **kwargs).to(device)
  File "/home2/xhuang/.conda/envs/llama/lib/python3.9/site-packages/bitsandbytes/nn/modules.py", line 294, in to
    return self.cuda(device)
  File "/home2/xhuang/.conda/envs/llama/lib/python3.9/site-packages/bitsandbytes/nn/modules.py", line 258, in cuda
    CB, CBt, SCB, SCBt, coo_tensorB = bnb.functional.double_quant(B)
  File "/home2/xhuang/.conda/envs/llama/lib/python3.9/site-packages/bitsandbytes/functional.py", line 1987, in double_quant
    row_stats, col_stats, nnz_row_ptr = get_colrow_absmax(
  File "/home2/xhuang/.conda/envs/llama/lib/python3.9/site-packages/bitsandbytes/functional.py", line 1876, in get_colrow_absmax
    lib.cget_col_row_stats(ptrA, ptrRowStats, ptrColStats, ptrNnzrows, ct.c_float(threshold), rows, cols)
  File "/home2/xhuang/.conda/envs/llama/lib/python3.9/ctypes/__init__.py", line 395, in __getattr__
    func = self.__getitem__(name)
  File "/home2/xhuang/.conda/envs/llama/lib/python3.9/ctypes/__init__.py", line 400, in __getitem__
    func = self._FuncPtr((name_or_ordinal, self))
AttributeError: /home2/xhuang/.conda/envs/llama/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cget_col_row_stats

Expected behavior

can not run official example

cdhx avatar Sep 02 '23 11:09 cdhx