accelerate icon indicating copy to clipboard operation
accelerate copied to clipboard

ValueError: weight is on the meta device, we need a `value` to put in on cpu.

Open toncho11 opened this issue 2 years ago • 12 comments

System Info

Windows 10
Accelerate Version: from git (recent)
Python 3.8.0
4GB GPU
16GB RAM

Information

  • [ ] The official example scripts
  • [ ] My own modified scripts

Tasks

  • [ ] One of the scripts in the examples/ folder of Accelerate or an officially supported no_trainer script in the examples folder of the transformers repo (such as run_no_trainer_glue.py)
  • [ ] My own task or dataset (give details below)

Reproduction

I am using:

BASE_MODEL = "decapoda-research/llama-7b-hf" LORA_WEIGHTS = "tloen/alpaca-lora-7b"

I get this error: ValueError: weight is on the meta device, we need a value to put in on cpu. in modeling.py, function set_module_tensor_to_device:

    if old_value.device == torch.device("meta") and device not in ["meta", torch.device("meta")] and value is None:
        raise ValueError(f"{tensor_name} is on the meta device, we need a `value` to put in on {device}.")

More details:

  • I am trying to load the model on my GPU 4GB. So I am low on GPU resources and I suppose a lot offloading is performed back and forward between the CPU and the GPU.
  • In my code there is model.half(), but this gives me an error RuntimeError: "addmm_impl_cpu_" not implemented for 'Half', so I disabled this code in primary script
  • So my code goes further (after not using Half()), but it fails with the error above about the meta device.

The source code is available: here

Might be related: https://github.com/huggingface/accelerate/issues/1197

Expected behavior

No error. The model and the weights are loaded (in both CPU and GPU).

toncho11 avatar Apr 04 '23 08:04 toncho11

What is your version of Accelerate? Also note that decapoda-research/llama-7b-hf is not usable at all as they converted the model in the middle of the PR adding Llama and is not compatible with Transformers.

You should use another model or run the conversion script after obtaining the official weights from Meta.

sgugger avatar Apr 04 '23 13:04 sgugger

I installed accelerate yesterday from GIT. I forgot to say. It all works OK in Google Colab. Google Colab has a 16 GB GPU and the model is loaded OK. I use weights not from Meta, but from Alpaca Stanford. It does not work on my laptop with 4GB GPU when I insist on using the GPU. In CPU mode it also works on my laptop, but it takes between 20 and 40 minutes to get an answer to a prompt. So when I insist on using my 4GB GPU it fails somewhere in the process of putting back and forward the model between the GPU and the CPU (the two types of RAM). I do not understand the above error message very well. What is a "meta" device for example?

toncho11 avatar Apr 04 '23 13:04 toncho11

I am actually getting similar error with the accelerate library. I am running the code on my local mac (non A1) without GPUs.

 whisper_model = "openai/whisper-tiny"
 weights_location = hf_hub_download(whisper_model, 'pytorch_model.bin')
 config = AutoConfig.from_pretrained(whisper_model)
 with init_empty_weights():
      model = AutoModelWithLMHead.from_config(config)
 model = load_checkpoint_and_dispatch(model, weights_location, device_map='auto')

Getting this error

        if tensor_name not in module._parameters and tensor_name not in module._buffers:
            raise ValueError(f"{module} does not have a parameter or a buffer named {tensor_name}.")
        is_buffer = tensor_name in module._buffers
        old_value = getattr(module, tensor_name)
    
        if old_value.device == torch.device("meta") and device not in ["meta", torch.device("meta")] and value is None:
>           raise ValueError(f"{tensor_name} is on the meta device, we need a `value` to put in on {device}.")
E           ValueError: weight is on the meta device, we need a `value` to put in on cpu.

philip30 avatar Apr 14 '23 03:04 philip30

@philip30 This is because the initalization under init_empty_weights breaks the tied weights. You need to add a model.tie_weights() to re-tie them afterward.:

whisper_model = "openai/whisper-tiny"
weights_location = hf_hub_download(whisper_model, 'pytorch_model.bin')
config = AutoConfig.from_pretrained(whisper_model)
with init_empty_weights():
     model = AutoModelWithLMHead.from_config(config)
model.tie_weights()
model = load_checkpoint_and_dispatch(model, weights_location, device_map='auto')

This is in the documentation

sgugger avatar Apr 14 '23 11:04 sgugger

I ran into this error " ValueError: weight is on the meta device, we need a value to put in on cpu." while loading llama-7B in 8 bits as well. But it was able to load the model before. The only difference is I pip updated transformers. I rolled it back to the previous version of transformer and it can load the model just fine.

discoelysiumLW avatar Apr 19 '23 23:04 discoelysiumLW

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar May 14 '23 15:05 github-actions[bot]

I'm still hitting this issue:

/home/huide/conda3/envs/vicuna/lib/python3.9/site-packages/accelerate/utils/modeling.py:13 │
│ 6 in set_module_tensor_to_device                                                           │
│                                                                                            │
│   133 │   old_value = getattr(module, tensor_name)                                         │
│   134 │                                                                                    │
│   135 │   if old_value.device == torch.device("meta") and device not in ["meta", torch.dev │
│ ❱ 136 │   │   raise ValueError(f"{tensor_name} is on the meta device, we need a `value` to │
│   137 │                                                                                    │
│   138 │   if value is not None:                                                            │
│   139 │   │   if dtype is None:                                                            │
╰────────────────────────────────────────────────────────────────────────────────────────────╯
ValueError: weight is on the meta device, we need a `value` to put in on 0.

huide9 avatar May 22 '23 02:05 huide9

@huide9 Like for everyone else there is nothing we can do without the code reproducing the error.

sgugger avatar May 22 '23 13:05 sgugger

@sgugger This is my code:

https://github.com/toncho11/ML_examples/blob/ee3f69147b53a7cee5d2e80d36f8eafad7ef9ef6/Transformers/HuggingFace/ChatBots/ChatBotAlpacaLoraFromStanfordConsole.py

that has produced the error. Please run it more than one time.

toncho11 avatar May 22 '23 14:05 toncho11

cc @younesbelkada since this is using 8bit-loading

sgugger avatar May 22 '23 15:05 sgugger

I installed accelerate yesterday from GIT. I forgot to say. It all works OK in Google Colab. Google Colab has a 16 GB GPU and the model is loaded OK. I use weights not from Meta, but from Alpaca Stanford. It does not work on my laptop with 4GB GPU when I insist on using the GPU. In CPU mode it also works on my laptop, but it takes between 20 and 40 minutes to get an answer to a prompt. So when I insist on using my 4GB GPU it fails somewhere in the process of putting back and forward the model between the GPU and the CPU (the two types of RAM). I do not understand the above error message very well. What is a "meta" device for example?

meta device was introduced by pytorch, with the help of meta device we can load large tensors without worrying about GPU or CPU RAM, but it does not deal with data in only returns us the shapes. https://huggingface.co/blog/accelerate-large-models#:~:text=PyTorch%201.9%20introduced%20a%20new,CPU%20(or%20GPU)%20RAM. This will give you a better understanding.

anujsahani01 avatar May 31 '23 22:05 anujsahani01

cc @younesbelkada since this is using 8bit-loading

sir please help we out with this error in would be a great help.

i am using 🤗tool " accelerate " to initializing the model then using load_checkpoint_and_dispatch i am loading the model weights and all. But its giving me this error: ValueError: offload is not a folder containing a .index.json file.

i am not able to understant what exactly the error is. please have a look at the snip which show the offload folder and error image

please have a look at my code. https://drive.google.com/file/d/1-ccrx1Q5tkLUYtZBGi5lNZGjPMyr_X9U/view?usp=sharing

please help me in solving this error. Your inputs will be highly appreciated. Thank You!

anujsahani01 avatar May 31 '23 22:05 anujsahani01

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Jun 25 '23 15:06 github-actions[bot]

@sgugger I'd like to re-open this issue. I also face it. It goes away when enabling --low_cpu_mem_usage. I don't have a minimal example, but full code that I used that triggered this is here.

Use this config as config.json

{
  "log_on_each_node": false,
  "report_to": "wandb",

  "per_device_train_batch_size": 2,
  "per_device_eval_batch_size": 2,
  "gradient_accumulation_steps": 8,

  "optim": "paged_adamw_32bit",
  "learning_rate": 2e-4,
  "weight_decay": 0.01,
  "lr_scheduler_type": "cosine",
  "warmup_ratio": 0.03,
  "adam_beta1": 0.9,
  "adam_beta2": 0.95,

  "save_steps": 20,
  "evaluation_strategy": "steps",
  "eval_steps": 20,
  "logging_first_step": true,
  "logging_steps": 1,
  "num_train_epochs": 2,

  "early_stopping_patience": null,
  "early_stopping_threshold": null,

  "dataset_name": "BramVanroy/dutch_chat_datasets",
  "group_by_length": true,
  "preprocessing_num_workers": 24,
  "dataset_batch_size": 1000,
  "output_dir": "results",
  "validation_split_percentage": 5,
  "load_best_model_at_end": true,
  "save_total_limit": 3,

  "trust_remote_code": true,
  "model_name_or_path": "BramVanroy/llama2-13b-ft-mc4_nl_cleaned_tiny",
  "max_length": 4096,
  "use_nested_quant": true,
  "bf16": true,
  "tf32": true,
  "load_in_4bit": true,
  "bnb_4bit_quant_type": "nf4",
  "torch_dtype": "bfloat16",
  "bnb_4bit_compute_dtype": "bfloat16",
  "lora_alpha": 32,
  "lora_r": 4,
  "low_cpu_mem_usage": false,
  "ddp_find_unused_parameters": true,
  "use_peft": true,
  "max_grad_norm": 1.0,
  "use_flash_attention": true,

  "do_train": true,
  "do_eval": true
}

Call as you would

python run_chat_modeling.py config.json

In my case, I used deepspeed on 4x 3090s (the config is in the repository that I linked):

deepspeed src/llm_finetuning/run_chat_modeling.py chat_config.json --deepspeed deepspeed_configs/ds_config_zero2.json

Hope that helps to reproduce the issue?

BramVanroy avatar Aug 17 '23 19:08 BramVanroy

I don't think it is possible to use load_in4bit without at least low_cpu_mem_usage=True (and normally you need device_map="auto").

In any case this is not because the error message is the same that it is the same issue, so please open a new one.

sgugger avatar Aug 18 '23 05:08 sgugger

@sgugger Sure, I made a new issue https://github.com/huggingface/accelerate/issues/1858.

On a related note, I thought device_map="auto" was for inference only (also discussed here)?

BramVanroy avatar Aug 18 '23 08:08 BramVanroy

If you are not offloading anything (e.g. the device map only contains GPUs), it works for training as well.

sgugger avatar Aug 18 '23 11:08 sgugger

I am trying to finetune the gptq model ("TheBloke/StableBeluga2-70B-GPTQ" )with the qlora code in full parameter mode , I got the following errors: ValueError qweight is on the meta device, we need a value to put in on 0.
This my code : max_memory_MB: int = field( # default=80000, default=29000,

    metadata={"help": "Free memory per gpu."}
)

训练GPTQ特有的配置

from transformers import GPTQConfig
quantization_config_loading = GPTQConfig(bits=3, disable_exllama=True)


model = AutoModelForCausalLM.from_pretrained(
    args.model_name_or_path,
    revision="gptq-3bit-128g-actorder_True",
    cache_dir=args.cache_dir,
    load_in_4bit=args.bits == 4,
    load_in_8bit=args.bits == 8,
    device_map=device_map,
    max_memory=max_memory,
    quantization_config=quantization_config_loading,

hzgdeerHo avatar Sep 19 '23 11:09 hzgdeerHo

@sgugger can you help? thanks

hzgdeerHo avatar Sep 19 '23 11:09 hzgdeerHo

IF I CHANGE THE CONFIGURATION ,max_memory_MB: int = field( # default=80000, default=30000, I GOT CUDA out of memory. Tried to allocate 896.00 MiB (GPU 0; 31.74 GiB total capacity;

hzgdeerHo avatar Sep 19 '23 11:09 hzgdeerHo

Anyone can HELP?

hzgdeerHo avatar Sep 19 '23 11:09 hzgdeerHo

I don't think it is possible to use load_in4bit without at least low_cpu_mem_usage=True (and normally you need device_map="auto").

In any case this is not because the error message is the same that it is the same issue, so please open a new one.

Can someone confirm that low_cpu_mem_usage must be True to use load_in4bit?

noobmaster29 avatar Nov 05 '23 03:11 noobmaster29