accelerate
accelerate copied to clipboard
ValueError: weight is on the meta device, we need a `value` to put in on cpu.
System Info
Windows 10
Accelerate Version: from git (recent)
Python 3.8.0
4GB GPU
16GB RAM
Information
- [ ] The official example scripts
- [ ] My own modified scripts
Tasks
- [ ] One of the scripts in the examples/ folder of Accelerate or an officially supported
no_trainerscript in theexamplesfolder of thetransformersrepo (such asrun_no_trainer_glue.py) - [ ] My own task or dataset (give details below)
Reproduction
I am using:
BASE_MODEL = "decapoda-research/llama-7b-hf" LORA_WEIGHTS = "tloen/alpaca-lora-7b"
I get this error: ValueError: weight is on the meta device, we need a value to put in on cpu. in modeling.py, function set_module_tensor_to_device:
if old_value.device == torch.device("meta") and device not in ["meta", torch.device("meta")] and value is None:
raise ValueError(f"{tensor_name} is on the meta device, we need a `value` to put in on {device}.")
More details:
- I am trying to load the model on my GPU 4GB. So I am low on GPU resources and I suppose a lot offloading is performed back and forward between the CPU and the GPU.
- In my code there is model.half(), but this gives me an error
RuntimeError: "addmm_impl_cpu_" not implemented for 'Half', so I disabled this code in primary script - So my code goes further (after not using Half()), but it fails with the error above about the meta device.
The source code is available: here
Might be related: https://github.com/huggingface/accelerate/issues/1197
Expected behavior
No error. The model and the weights are loaded (in both CPU and GPU).
What is your version of Accelerate? Also note that decapoda-research/llama-7b-hf is not usable at all as they converted the model in the middle of the PR adding Llama and is not compatible with Transformers.
You should use another model or run the conversion script after obtaining the official weights from Meta.
I installed accelerate yesterday from GIT. I forgot to say. It all works OK in Google Colab. Google Colab has a 16 GB GPU and the model is loaded OK. I use weights not from Meta, but from Alpaca Stanford. It does not work on my laptop with 4GB GPU when I insist on using the GPU. In CPU mode it also works on my laptop, but it takes between 20 and 40 minutes to get an answer to a prompt. So when I insist on using my 4GB GPU it fails somewhere in the process of putting back and forward the model between the GPU and the CPU (the two types of RAM). I do not understand the above error message very well. What is a "meta" device for example?
I am actually getting similar error with the accelerate library. I am running the code on my local mac (non A1) without GPUs.
whisper_model = "openai/whisper-tiny"
weights_location = hf_hub_download(whisper_model, 'pytorch_model.bin')
config = AutoConfig.from_pretrained(whisper_model)
with init_empty_weights():
model = AutoModelWithLMHead.from_config(config)
model = load_checkpoint_and_dispatch(model, weights_location, device_map='auto')
Getting this error
if tensor_name not in module._parameters and tensor_name not in module._buffers:
raise ValueError(f"{module} does not have a parameter or a buffer named {tensor_name}.")
is_buffer = tensor_name in module._buffers
old_value = getattr(module, tensor_name)
if old_value.device == torch.device("meta") and device not in ["meta", torch.device("meta")] and value is None:
> raise ValueError(f"{tensor_name} is on the meta device, we need a `value` to put in on {device}.")
E ValueError: weight is on the meta device, we need a `value` to put in on cpu.
@philip30 This is because the initalization under init_empty_weights breaks the tied weights. You need to add a model.tie_weights() to re-tie them afterward.:
whisper_model = "openai/whisper-tiny"
weights_location = hf_hub_download(whisper_model, 'pytorch_model.bin')
config = AutoConfig.from_pretrained(whisper_model)
with init_empty_weights():
model = AutoModelWithLMHead.from_config(config)
model.tie_weights()
model = load_checkpoint_and_dispatch(model, weights_location, device_map='auto')
This is in the documentation
I ran into this error " ValueError: weight is on the meta device, we need a value to put in on cpu." while loading llama-7B in 8 bits as well. But it was able to load the model before. The only difference is I pip updated transformers. I rolled it back to the previous version of transformer and it can load the model just fine.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
I'm still hitting this issue:
/home/huide/conda3/envs/vicuna/lib/python3.9/site-packages/accelerate/utils/modeling.py:13 │
│ 6 in set_module_tensor_to_device │
│ │
│ 133 │ old_value = getattr(module, tensor_name) │
│ 134 │ │
│ 135 │ if old_value.device == torch.device("meta") and device not in ["meta", torch.dev │
│ ❱ 136 │ │ raise ValueError(f"{tensor_name} is on the meta device, we need a `value` to │
│ 137 │ │
│ 138 │ if value is not None: │
│ 139 │ │ if dtype is None: │
╰────────────────────────────────────────────────────────────────────────────────────────────╯
ValueError: weight is on the meta device, we need a `value` to put in on 0.
@huide9 Like for everyone else there is nothing we can do without the code reproducing the error.
@sgugger This is my code:
https://github.com/toncho11/ML_examples/blob/ee3f69147b53a7cee5d2e80d36f8eafad7ef9ef6/Transformers/HuggingFace/ChatBots/ChatBotAlpacaLoraFromStanfordConsole.py
that has produced the error. Please run it more than one time.
cc @younesbelkada since this is using 8bit-loading
I installed accelerate yesterday from GIT. I forgot to say. It all works OK in Google Colab. Google Colab has a 16 GB GPU and the model is loaded OK. I use weights not from Meta, but from Alpaca Stanford. It does not work on my laptop with 4GB GPU when I insist on using the GPU. In CPU mode it also works on my laptop, but it takes between 20 and 40 minutes to get an answer to a prompt. So when I insist on using my 4GB GPU it fails somewhere in the process of putting back and forward the model between the GPU and the CPU (the two types of RAM). I do not understand the above error message very well. What is a "meta" device for example?
meta device was introduced by pytorch, with the help of meta device we can load large tensors without worrying about GPU or CPU RAM, but it does not deal with data in only returns us the shapes. https://huggingface.co/blog/accelerate-large-models#:~:text=PyTorch%201.9%20introduced%20a%20new,CPU%20(or%20GPU)%20RAM. This will give you a better understanding.
cc @younesbelkada since this is using 8bit-loading
sir please help we out with this error in would be a great help.
i am using 🤗tool " accelerate " to initializing the model then using load_checkpoint_and_dispatch i am loading the model weights and all.
But its giving me this error:
ValueError: offload is not a folder containing a .index.json file.
i am not able to understant what exactly the error is.
please have a look at the snip which show the offload folder and error
please have a look at my code. https://drive.google.com/file/d/1-ccrx1Q5tkLUYtZBGi5lNZGjPMyr_X9U/view?usp=sharing
please help me in solving this error. Your inputs will be highly appreciated. Thank You!
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
@sgugger I'd like to re-open this issue. I also face it. It goes away when enabling --low_cpu_mem_usage. I don't have a minimal example, but full code that I used that triggered this is here.
Use this config as config.json
{
"log_on_each_node": false,
"report_to": "wandb",
"per_device_train_batch_size": 2,
"per_device_eval_batch_size": 2,
"gradient_accumulation_steps": 8,
"optim": "paged_adamw_32bit",
"learning_rate": 2e-4,
"weight_decay": 0.01,
"lr_scheduler_type": "cosine",
"warmup_ratio": 0.03,
"adam_beta1": 0.9,
"adam_beta2": 0.95,
"save_steps": 20,
"evaluation_strategy": "steps",
"eval_steps": 20,
"logging_first_step": true,
"logging_steps": 1,
"num_train_epochs": 2,
"early_stopping_patience": null,
"early_stopping_threshold": null,
"dataset_name": "BramVanroy/dutch_chat_datasets",
"group_by_length": true,
"preprocessing_num_workers": 24,
"dataset_batch_size": 1000,
"output_dir": "results",
"validation_split_percentage": 5,
"load_best_model_at_end": true,
"save_total_limit": 3,
"trust_remote_code": true,
"model_name_or_path": "BramVanroy/llama2-13b-ft-mc4_nl_cleaned_tiny",
"max_length": 4096,
"use_nested_quant": true,
"bf16": true,
"tf32": true,
"load_in_4bit": true,
"bnb_4bit_quant_type": "nf4",
"torch_dtype": "bfloat16",
"bnb_4bit_compute_dtype": "bfloat16",
"lora_alpha": 32,
"lora_r": 4,
"low_cpu_mem_usage": false,
"ddp_find_unused_parameters": true,
"use_peft": true,
"max_grad_norm": 1.0,
"use_flash_attention": true,
"do_train": true,
"do_eval": true
}
Call as you would
python run_chat_modeling.py config.json
In my case, I used deepspeed on 4x 3090s (the config is in the repository that I linked):
deepspeed src/llm_finetuning/run_chat_modeling.py chat_config.json --deepspeed deepspeed_configs/ds_config_zero2.json
Hope that helps to reproduce the issue?
I don't think it is possible to use load_in4bit without at least low_cpu_mem_usage=True (and normally you need device_map="auto").
In any case this is not because the error message is the same that it is the same issue, so please open a new one.
@sgugger Sure, I made a new issue https://github.com/huggingface/accelerate/issues/1858.
On a related note, I thought device_map="auto" was for inference only (also discussed here)?
If you are not offloading anything (e.g. the device map only contains GPUs), it works for training as well.
I am trying to finetune the gptq model ("TheBloke/StableBeluga2-70B-GPTQ" )with the qlora code in full parameter mode , I got the following errors: ValueError
qweight is on the meta device, we need a value to put in on 0.
This my code :
max_memory_MB: int = field(
# default=80000,
default=29000,
metadata={"help": "Free memory per gpu."}
)
训练GPTQ特有的配置
from transformers import GPTQConfig
quantization_config_loading = GPTQConfig(bits=3, disable_exllama=True)
model = AutoModelForCausalLM.from_pretrained(
args.model_name_or_path,
revision="gptq-3bit-128g-actorder_True",
cache_dir=args.cache_dir,
load_in_4bit=args.bits == 4,
load_in_8bit=args.bits == 8,
device_map=device_map,
max_memory=max_memory,
quantization_config=quantization_config_loading,
@sgugger can you help? thanks
IF I CHANGE THE CONFIGURATION ,max_memory_MB: int = field( # default=80000, default=30000, I GOT CUDA out of memory. Tried to allocate 896.00 MiB (GPU 0; 31.74 GiB total capacity;
Anyone can HELP?
I don't think it is possible to use
load_in4bitwithout at leastlow_cpu_mem_usage=True(and normally you needdevice_map="auto").In any case this is not because the error message is the same that it is the same issue, so please open a new one.
Can someone confirm that low_cpu_mem_usage must be True to use load_in4bit?