transformers Error while moving model to GPU `NotImplementedError: Cannot copy out of meta tensor; no data!`

System Info

transformers version: 4.40.0
Platform: Linux-5.15.0-105-generic-x86_64-with-glibc2.35
Python version: 3.10.12
Huggingface_hub version: 0.22.2
Safetensors version: 0.4.1
Accelerate version: 0.25.0
Accelerate config: not found
PyTorch version (GPU?): 2.2.2+cu121 (True)
Tensorflow version (GPU?): 2.16.1 (True)
Flax version (CPU?/GPU?/TPU?): 0.8.2 (cpu)
Jax version: 0.4.26
JaxLib version: 0.4.21`

Who can help?

@ArthurZucker @sgugger since I see some implementations of this inside accelerate to skip initialization.

Reproduction

c = LlamaConfig(<path to config.json>)
with torch.device('meta'):
    m = LlamaForCausalLM(c)
    
w = torch.load(<path to weights.bin file>)
m.load_state_dict(w, assign=True)
m.to("cuda:0") //throws error

The last line throws the following error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/goelayus/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2692, in to
    return super().to(*args, **kwargs)
  File "/home/goelayus/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1152, in to
    return self._apply(convert)
  File "/home/goelayus/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 802, in _apply
    module._apply(fn)
  File "/home/goelayus/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 802, in _apply
    module._apply(fn)
  File "/home/goelayus/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 802, in _apply
    module._apply(fn)
  [Previous line repeated 2 more times]
  File "/home/goelayus/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 849, in _apply
    self._buffers[key] = fn(buf)
  File "/home/goelayus/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1150, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
NotImplementedError: Cannot copy out of meta tensor; no data!

Expected behavior

The model should be copied to the GPU device.

May 07 '24 22:05 goelayu

To add to the above, if i use init_empty_weights from accelerate I can skip the initialization without any errors.

Wondering what is the difference between the two? Also if it is possible to achieve the same using the torch.device('meta') context manager.

May 07 '24 23:05 goelayu

Mmmm could you make sure that the map_location is correct? This might be expected, cc @SunMarc WDYT?

May 09 '24 14:05 ArthurZucker

So this issue seems to be documented in the code itself big_modeling.py, turns out you can't run model.to when using the meta device. I was hoping for some kind of explanation as to why is that the case? (hence tagged @sgugger since the big_modeling.py file seems to be often modified by them)

Also if you notice my comment from above, replacing torch.device('meta') with init_empty_weights from the accelerate package seems to resolve the issue.

May 09 '24 17:05 goelayu

cc @muellerzr for the accelerate related stuff rather than Sylvain!

May 10 '24 06:05 ArthurZucker

Hi @goelayu, this is expected since with torch.device('meta') also puts the buffers on the meta device. However, non persistant buffers are not saved in the state_dict. So, in the case of a llama model where we do have non persistant buffers, you get an error after loading the weights With init_empty_weights, by default, we don't put the buffer on the meta device. This is why it is working. Hope it is clearer !

May 13 '24 12:05 SunMarc

@SunMarc thanks for the response, that answers my question.

May 17 '24 18:05 goelayu

transformers transformers copied to clipboard

Error while moving model to GPU `NotImplementedError: Cannot copy out of meta tensor; no data!`

System Info

Who can help?

Reproduction

Expected behavior

transformers
transformers copied to clipboard