Reminder

[X] I have read the README and searched the existing issues.

Reproduction

CUDA_VISIBLE_DEVICES=0 python data/src/cli_demo.py
--model_name_or_path weights/Mixtral-8x7B-Instruct-v0.1
--adapter_name_or_path data/saves/Mixtral-8x7B-Chat/lora/train_2024-03-19-20/checkpoint-4000
--template default
--finetuning_type lora
#--empty_init False

===================================BUG REPORT=================================== Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues

bin /home/vipuser/miniconda3/envs/Py10NLP/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda118.so /home/vipuser/miniconda3/envs/Py10NLP/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /home/vipuser/miniconda3/envs/Py10NLP did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths... warn(msg) CUDA SETUP: CUDA runtime path found: /usr/local/cuda-11.8/lib64/libcudart.so CUDA SETUP: Highest compute capability among GPUs detected: 8.0 CUDA SETUP: Detected CUDA version 118 CUDA SETUP: Loading binary /home/vipuser/miniconda3/envs/Py10NLP/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda118.so...

/home/vipuser/miniconda3/envs/Py10NLP/lib/python3.10/site-packages/torch/nn/modules/module.py:2025: UserWarning: for base_model.model.model.layers.29.self_attn.q_proj.lora_A.default.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass assign=True to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?) warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta ' /home/vipuser/miniconda3/envs/Py10NLP/lib/python3.10/site-packages/torch/nn/modules/module.py:2025: UserWarning: for base_model.model.model.layers.29.self_attn.q_proj.lora_B.default.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass assign=True to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?) warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta ' /home/vipuser/miniconda3/envs/Py10NLP/lib/python3.10/site-packages/accelerate/utils/offload.py:33: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). array=torch.tensor(weight,device="cpu").numpy()

Traceback (most recent call last): File "/root/data/src/cli_demo.py", line 68, in main() File "/root/data/src/cli_demo.py", line 34, in main chat_model = ChatModel() File "/root/data/src/llmtuner/chat/chat_model.py", line 23, in init self.engine: "BaseEngine" = HuggingfaceEngine(model_args, data_args, finetuning_args, generating_args) File "/root/data/src/llmtuner/chat/hf_engine.py", line 33, in init self.model, self.tokenizer = load_model_and_tokenizer( File "/root/data/src/llmtuner/model/loader.py", line 146, in load_model_and_tokenizer model = load_model(tokenizer, model_args, finetuning_args, is_trainable, add_valuehead) File "/root/data/src/llmtuner/model/loader.py", line 94, in load_model model = init_adapter(model, model_args, finetuning_args, is_trainable) File "/root/data/src/llmtuner/model/adapter.py", line 110, in init_adapter model: "LoraModel" = PeftModel.from_pretrained( File "/home/vipuser/miniconda3/envs/Py10NLP/lib/python3.10/site-packages/peft/peft_model.py", line 353, in from_pretrained model.load_adapter(model_id, adapter_name, is_trainable=is_trainable, **kwargs) File "/home/vipuser/miniconda3/envs/Py10NLP/lib/python3.10/site-packages/peft/peft_model.py", line 727, in load_adapter dispatch_model( File "/home/vipuser/miniconda3/envs/Py10NLP/lib/python3.10/site-packages/accelerate/big_modeling.py", line 384, in dispatch_model offload_state_dict(offload_dir, disk_state_dict) File "/home/vipuser/miniconda3/envs/Py10NLP/lib/python3.10/site-packages/accelerate/utils/offload.py", line 99, in offload_state_dict index = offload_weight(parameter, name, save_dir, index=index) File "/home/vipuser/miniconda3/envs/Py10NLP/lib/python3.10/site-packages/accelerate/utils/offload.py", line 33, in offload_weight array=torch.tensor(weight,device="cpu").numpy() NotImplementedError: Cannot copy out of meta tensor; no data!

GPU RAM Free: 81042MB | Used: 7MB | Util 0% | Total 81920MB cpu可用内存：1461932032 使用率：25.4

Expected behavior

mixtral、lora推理

System Info

Copy-and-paste the text below in your GitHub issue and FILL OUT the two last points.

transformers version: 4.38.1
Platform: Linux-6.2.0-35-generic-x86_64-with-glibc2.35
Python version: 3.10.13
Huggingface_hub version: 0.21.4
Safetensors version: 0.4.2
Accelerate version: 0.28.0
Accelerate config: not found
PyTorch version (GPU?): 2.1.2+cu121 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?:
Using distributed or parallel set-up in script?:

Others

No response

Mar 23 '24 12:03 hakduwqkfh

使用非 Mixtral 类模型会有问题吗？需要看一下是不是 MOE的模型类型不同导致的。

Mar 23 '24 13:03 codemayq

使用非 Mixtral 类模型会有问题吗？需要看一下是不是 MOE的模型类型不同导致的。这里非元数据到cpu数据上会有问题

Mar 23 '24 14:03 hakduwqkfh

目前看到 https://github.com/hiyouga/LLaMA-Factory/issues/2933 跟你是类似的问题，我们有空会进一步排查。

Mar 23 '24 14:03 codemayq

try --low_cpu_mem_usage False

Mar 23 '24 16:03 hiyouga

LLaMA-Factory LLaMA-Factory copied to clipboard

NotImplementedError: Cannot copy out of meta tensor; no data!

Reminder

Reproduction

and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues

Expected behavior

System Info

Others

LLaMA-Factory
LLaMA-Factory copied to clipboard