System Info

Windows 10
Accelerate Version: from git (recent)
Python 3.8.0
4GB GPU
16GB RAM

Information

[ ] The official example scripts
[ ] My own modified scripts

Tasks

[ ] One of the scripts in the examples/ folder of Accelerate or an officially supported no_trainer script in the examples folder of the transformers repo (such as run_no_trainer_glue.py)
[ ] My own task or dataset (give details below)

Reproduction

I am using:

BASE_MODEL = "decapoda-research/llama-7b-hf" LORA_WEIGHTS = "tloen/alpaca-lora-7b"

I get this error: ValueError: weight is on the meta device, we need a value to put in on cpu. in modeling.py, function set_module_tensor_to_device:

    if old_value.device == torch.device("meta") and device not in ["meta", torch.device("meta")] and value is None:
        raise ValueError(f"{tensor_name} is on the meta device, we need a `value` to put in on {device}.")

More details:

I am trying to load the model on my GPU 4GB. So I am low on GPU resources and I suppose a lot offloading is performed back and forward between the CPU and the GPU.
In my code there is model.half(), but this gives me an error RuntimeError: "addmm_impl_cpu_" not implemented for 'Half', so I disabled this code in primary script
So my code goes further (after not using Half()), but it fails with the error above about the meta device.

The source code is available: here

Might be related: https://github.com/huggingface/accelerate/issues/1197

Expected behavior

No error. The model and the weights are loaded (in both CPU and GPU).

Apr 04 '23 08:04 toncho11

What is your version of Accelerate? Also note that decapoda-research/llama-7b-hf is not usable at all as they converted the model in the middle of the PR adding Llama and is not compatible with Transformers.

You should use another model or run the conversion script after obtaining the official weights from Meta.

Apr 04 '23 13:04 sgugger

I installed accelerate yesterday from GIT. I forgot to say. It all works OK in Google Colab. Google Colab has a 16 GB GPU and the model is loaded OK. I use weights not from Meta, but from Alpaca Stanford. It does not work on my laptop with 4GB GPU when I insist on using the GPU. In CPU mode it also works on my laptop, but it takes between 20 and 40 minutes to get an answer to a prompt. So when I insist on using my 4GB GPU it fails somewhere in the process of putting back and forward the model between the GPU and the CPU (the two types of RAM). I do not understand the above error message very well. What is a "meta" device for example?

Apr 04 '23 13:04 toncho11

I am actually getting similar error with the accelerate library. I am running the code on my local mac (non A1) without GPUs.

 whisper_model = "openai/whisper-tiny"
 weights_location = hf_hub_download(whisper_model, 'pytorch_model.bin')
 config = AutoConfig.from_pretrained(whisper_model)
 with init_empty_weights():
      model = AutoModelWithLMHead.from_config(config)
 model = load_checkpoint_and_dispatch(model, weights_location, device_map='auto')

Getting this error

        if tensor_name not in module._parameters and tensor_name not in module._buffers:
            raise ValueError(f"{module} does not have a parameter or a buffer named {tensor_name}.")
        is_buffer = tensor_name in module._buffers
        old_value = getattr(module, tensor_name)
    
        if old_value.device == torch.device("meta") and device not in ["meta", torch.device("meta")] and value is None:
>           raise ValueError(f"{tensor_name} is on the meta device, we need a `value` to put in on {device}.")
E           ValueError: weight is on the meta device, we need a `value` to put in on cpu.

Apr 14 '23 03:04 philip30

@philip30 This is because the initalization under init_empty_weights breaks the tied weights. You need to add a model.tie_weights() to re-tie them afterward.:

whisper_model = "openai/whisper-tiny"
weights_location = hf_hub_download(whisper_model, 'pytorch_model.bin')
config = AutoConfig.from_pretrained(whisper_model)
with init_empty_weights():
     model = AutoModelWithLMHead.from_config(config)
model.tie_weights()
model = load_checkpoint_and_dispatch(model, weights_location, device_map='auto')

This is in the documentation

Apr 14 '23 11:04 sgugger

I ran into this error " ValueError: weight is on the meta device, we need a value to put in on cpu." while loading llama-7B in 8 bits as well. But it was able to load the model before. The only difference is I pip updated transformers. I rolled it back to the previous version of transformer and it can load the model just fine.

Apr 19 '23 23:04 discoelysiumLW

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

May 14 '23 15:05 github-actions[bot]

I'm still hitting this issue:

/home/huide/conda3/envs/vicuna/lib/python3.9/site-packages/accelerate/utils/modeling.py:13 │
│ 6 in set_module_tensor_to_device                                                           │
│                                                                                            │
│   133 │   old_value = getattr(module, tensor_name)                                         │
│   134 │                                                                                    │
│   135 │   if old_value.device == torch.device("meta") and device not in ["meta", torch.dev │
│ ❱ 136 │   │   raise ValueError(f"{tensor_name} is on the meta device, we need a `value` to │
│   137 │                                                                                    │
│   138 │   if value is not None:                                                            │
│   139 │   │   if dtype is None:                                                            │
╰────────────────────────────────────────────────────────────────────────────────────────────╯
ValueError: weight is on the meta device, we need a `value` to put in on 0.

May 22 '23 02:05 huide9

@huide9 Like for everyone else there is nothing we can do without the code reproducing the error.

May 22 '23 13:05 sgugger

@sgugger This is my code:

https://github.com/toncho11/ML_examples/blob/ee3f69147b53a7cee5d2e80d36f8eafad7ef9ef6/Transformers/HuggingFace/ChatBots/ChatBotAlpacaLoraFromStanfordConsole.py

that has produced the error. Please run it more than one time.

May 22 '23 14:05 toncho11

cc @younesbelkada since this is using 8bit-loading

May 22 '23 15:05 sgugger

I installed accelerate yesterday from GIT. I forgot to say. It all works OK in Google Colab. Google Colab has a 16 GB GPU and the model is loaded OK. I use weights not from Meta, but from Alpaca Stanford. It does not work on my laptop with 4GB GPU when I insist on using the GPU. In CPU mode it also works on my laptop, but it takes between 20 and 40 minutes to get an answer to a prompt. So when I insist on using my 4GB GPU it fails somewhere in the process of putting back and forward the model between the GPU and the CPU (the two types of RAM). I do not understand the above error message very well. What is a "meta" device for example?

meta device was introduced by pytorch, with the help of meta device we can load large tensors without worrying about GPU or CPU RAM, but it does not deal with data in only returns us the shapes. https://huggingface.co/blog/accelerate-large-models#:~:text=PyTorch%201.9%20introduced%20a%20new,CPU%20(or%20GPU)%20RAM. This will give you a better understanding.

May 31 '23 22:05 anujsahani01

cc @younesbelkada since this is using 8bit-loading

sir please help we out with this error in would be a great help.

i am using 🤗tool " accelerate " to initializing the model then using load_checkpoint_and_dispatch i am loading the model weights and all. But its giving me this error: ValueError: offload is not a folder containing a .index.json file.

i am not able to understant what exactly the error is. please have a look at the snip which show the offload folder and error

please have a look at my code. https://drive.google.com/file/d/1-ccrx1Q5tkLUYtZBGi5lNZGjPMyr_X9U/view?usp=sharing

please help me in solving this error. Your inputs will be highly appreciated. Thank You!

May 31 '23 22:05 anujsahani01

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Jun 25 '23 15:06 github-actions[bot]

@sgugger I'd like to re-open this issue. I also face it. It goes away when enabling --low_cpu_mem_usage. I don't have a minimal example, but full code that I used that triggered this is here.

Use this config as config.json

{
  "log_on_each_node": false,
  "report_to": "wandb",

  "per_device_train_batch_size": 2,
  "per_device_eval_batch_size": 2,
  "gradient_accumulation_steps": 8,

  "optim": "paged_adamw_32bit",
  "learning_rate": 2e-4,
  "weight_decay": 0.01,
  "lr_scheduler_type": "cosine",
  "warmup_ratio": 0.03,
  "adam_beta1": 0.9,
  "adam_beta2": 0.95,

  "save_steps": 20,
  "evaluation_strategy": "steps",
  "eval_steps": 20,
  "logging_first_step": true,
  "logging_steps": 1,
  "num_train_epochs": 2,

  "early_stopping_patience": null,
  "early_stopping_threshold": null,

  "dataset_name": "BramVanroy/dutch_chat_datasets",
  "group_by_length": true,
  "preprocessing_num_workers": 24,
  "dataset_batch_size": 1000,
  "output_dir": "results",
  "validation_split_percentage": 5,
  "load_best_model_at_end": true,
  "save_total_limit": 3,

  "trust_remote_code": true,
  "model_name_or_path": "BramVanroy/llama2-13b-ft-mc4_nl_cleaned_tiny",
  "max_length": 4096,
  "use_nested_quant": true,
  "bf16": true,
  "tf32": true,
  "load_in_4bit": true,
  "bnb_4bit_quant_type": "nf4",
  "torch_dtype": "bfloat16",
  "bnb_4bit_compute_dtype": "bfloat16",
  "lora_alpha": 32,
  "lora_r": 4,
  "low_cpu_mem_usage": false,
  "ddp_find_unused_parameters": true,
  "use_peft": true,
  "max_grad_norm": 1.0,
  "use_flash_attention": true,

  "do_train": true,
  "do_eval": true
}

Call as you would

python run_chat_modeling.py config.json

In my case, I used deepspeed on 4x 3090s (the config is in the repository that I linked):

deepspeed src/llm_finetuning/run_chat_modeling.py chat_config.json --deepspeed deepspeed_configs/ds_config_zero2.json

Hope that helps to reproduce the issue?

Aug 17 '23 19:08 BramVanroy

I don't think it is possible to use load_in4bit without at least low_cpu_mem_usage=True (and normally you need device_map="auto").

In any case this is not because the error message is the same that it is the same issue, so please open a new one.

Aug 18 '23 05:08 sgugger

@sgugger Sure, I made a new issue https://github.com/huggingface/accelerate/issues/1858.

On a related note, I thought device_map="auto" was for inference only (also discussed here)?

Aug 18 '23 08:08 BramVanroy

If you are not offloading anything (e.g. the device map only contains GPUs), it works for training as well.

Aug 18 '23 11:08 sgugger

I am trying to finetune the gptq model ("TheBloke/StableBeluga2-70B-GPTQ" )with the qlora code in full parameter mode , I got the following errors: ValueError qweight is on the meta device, we need a value to put in on 0.
This my code : max_memory_MB: int = field( # default=80000, default=29000,

    metadata={"help": "Free memory per gpu."}
)

训练GPTQ特有的配置

from transformers import GPTQConfig
quantization_config_loading = GPTQConfig(bits=3, disable_exllama=True)


model = AutoModelForCausalLM.from_pretrained(
    args.model_name_or_path,
    revision="gptq-3bit-128g-actorder_True",
    cache_dir=args.cache_dir,
    load_in_4bit=args.bits == 4,
    load_in_8bit=args.bits == 8,
    device_map=device_map,
    max_memory=max_memory,
    quantization_config=quantization_config_loading,

Sep 19 '23 11:09 hzgdeerHo

@sgugger can you help? thanks

Sep 19 '23 11:09 hzgdeerHo

IF I CHANGE THE CONFIGURATION ,max_memory_MB: int = field( # default=80000, default=30000, I GOT CUDA out of memory. Tried to allocate 896.00 MiB (GPU 0; 31.74 GiB total capacity;

Sep 19 '23 11:09 hzgdeerHo

Anyone can HELP?

Sep 19 '23 11:09 hzgdeerHo

I don't think it is possible to use load_in4bit without at least low_cpu_mem_usage=True (and normally you need device_map="auto").

In any case this is not because the error message is the same that it is the same issue, so please open a new one.

Can someone confirm that low_cpu_mem_usage must be True to use load_in4bit?

Nov 05 '23 03:11 noobmaster29

accelerate accelerate copied to clipboard

ValueError: weight is on the meta device, we need a `value` to put in on cpu.

System Info

Information

Tasks

Reproduction

Expected behavior

训练GPTQ特有的配置

accelerate
accelerate copied to clipboard