LLaMA-Factory
LLaMA-Factory copied to clipboard
NotImplementedError: Cannot copy out of meta tensor; no data!
Reminder
- [X] I have read the README and searched the existing issues.
Reproduction
CUDA_VISIBLE_DEVICES=0
python data/src/cli_demo.py
--model_name_or_path weights/Mixtral-8x7B-Instruct-v0.1
--adapter_name_or_path data/saves/Mixtral-8x7B-Chat/lora/train_2024-03-19-20/checkpoint-4000
--template default
--finetuning_type lora
#--empty_init False
===================================BUG REPORT=================================== Welcome to bitsandbytes. For bug reports, please run
python -m bitsandbytes
and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
bin /home/vipuser/miniconda3/envs/Py10NLP/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda118.so /home/vipuser/miniconda3/envs/Py10NLP/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /home/vipuser/miniconda3/envs/Py10NLP did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths... warn(msg) CUDA SETUP: CUDA runtime path found: /usr/local/cuda-11.8/lib64/libcudart.so CUDA SETUP: Highest compute capability among GPUs detected: 8.0 CUDA SETUP: Detected CUDA version 118 CUDA SETUP: Loading binary /home/vipuser/miniconda3/envs/Py10NLP/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda118.so...
/home/vipuser/miniconda3/envs/Py10NLP/lib/python3.10/site-packages/torch/nn/modules/module.py:2025: UserWarning: for base_model.model.model.layers.29.self_attn.q_proj.lora_A.default.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass assign=True
to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/vipuser/miniconda3/envs/Py10NLP/lib/python3.10/site-packages/torch/nn/modules/module.py:2025: UserWarning: for base_model.model.model.layers.29.self_attn.q_proj.lora_B.default.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass assign=True
to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/vipuser/miniconda3/envs/Py10NLP/lib/python3.10/site-packages/accelerate/utils/offload.py:33: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
array=torch.tensor(weight,device="cpu").numpy()
Traceback (most recent call last):
File "/root/data/src/cli_demo.py", line 68, in
GPU RAM Free: 81042MB | Used: 7MB | Util 0% | Total 81920MB cpu可用内存:1461932032 使用率:25.4
Expected behavior
mixtral、lora推理
System Info
Copy-and-paste the text below in your GitHub issue and FILL OUT the two last points.
-
transformers
version: 4.38.1 - Platform: Linux-6.2.0-35-generic-x86_64-with-glibc2.35
- Python version: 3.10.13
- Huggingface_hub version: 0.21.4
- Safetensors version: 0.4.2
- Accelerate version: 0.28.0
- Accelerate config: not found
- PyTorch version (GPU?): 2.1.2+cu121 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?:
- Using distributed or parallel set-up in script?:
Others
No response
使用非 Mixtral 类模型会有问题吗? 需要看一下是不是 MOE的模型类型不同导致的。
使用非 Mixtral 类模型会有问题吗? 需要看一下是不是 MOE的模型类型不同导致的。 这里非元数据到cpu数据上会有问题
目前看到 https://github.com/hiyouga/LLaMA-Factory/issues/2933 跟你是类似的问题,我们有空会进一步排查。
try --low_cpu_mem_usage False