accelerate
accelerate copied to clipboard
Accelerate a non-HF model, like detectron2
System Info
- `Accelerate` version: 0.19.0
- Platform: Linux-5.10.147+-x86_64-with-glibc2.31
- Python version: 3.10.11
- Numpy version: 1.22.4
- PyTorch version (GPU?): 1.13.1+cu116 (True)
- System RAM: 12.68 GB
- GPU type: Tesla T4
- `Accelerate` default config:
Not found
Information
- [ ] The official example scripts
- [X] My own modified scripts
Tasks
- [ ] One of the scripts in the examples/ folder of Accelerate or an officially supported
no_trainer
script in theexamples
folder of thetransformers
repo (such asrun_no_trainer_glue.py
) - [X] My own task or dataset (give details below)
Reproduction
There is a model that embeds images and text, and I'd like to use it on 1 or 2 GPUs. Each GPU is smaller than the memory needed to run one inference, so I wanted to give Accelerate a try.
The model loading is not entirely controlled by me, it comes from the Detectron2 framework. There are many places where the framework calls .to(device)
, I wonder if that may be the source of the problem.
I am trying to run the model instantiation under with init_empty_weights():
, but this is failing with Cannot copy out of meta tensor; no data!
.
For reproduction, I have a colab link, with 3 added lines different from the official one (to change to accelerate): https://colab.research.google.com/drive/18UI5JTWlWkYCCKCWAlXeZqOXi46sPIFC?usp=sharing
The first run takes quite a few minutes (10+) due to install plus the download of the weights. Note that in Colab it may run fine, as their GPUs are slightly bigger (T4 has 16Gb, I have 2-4 x 11Gb).
Any pointers for how to run this?
Expected behavior
The init_empty_weights()
should cache all model.to()
calls so that it works when we don't have full control of the model initializaton.
I also expect to have maybe a bit more pointers for setting the weights of this model, when the torch.load()
is not in our control. But this is extra, I think I can figure it out.
OK, it turns out that the model.to(device)
call was indeed the problem. I have removed that in the source framework, and the model was initialised with empty weights successfully.
Now, I am trying to compute a device_map
:
device_map = infer_auto_device_map(model, max_memory={0: 5000, 1:5000})
This is failing with AttributeError: 'Parameter' object has no attribute 'named_children'
.
doing a debugging (%debug
cell below the errored one), the error comes from here: https://github.com/huggingface/accelerate/blob/ab379793d44be16d8fcac5c098a3ab9b6f5a7ec3/src/accelerate/utils/modeling.py#LL663C60-L663C60
The module
is of type <class 'torch.nn.parameter.Parameter'>
, which has no method named_children()
.
There is also a bug here: https://github.com/huggingface/accelerate/blob/ab379793d44be16d8fcac5c098a3ab9b6f5a7ec3/src/accelerate/utils/modeling.py#LL767C100-L767C116
When the verbose
is true, it can happen that the current device is disk
, so the current_max_size
is None, which generates an error:
I can see where the two last issues stem from and fix them. For the first one, the best we can do is ignore all calls to to
under the context manager, to make sure there is no error. I'm not sure if it could have other side effects however.
The PR linked above should fix the two last issues if you want to give it a try.
Hello,
Thank you for the fast action ! Indeed, the two issues are now fixed and I can correctly and verbosely compute a device map.
It outputs some negative numbers in the first steps if I pass some max_memory
, not sure why, but this is not a real problem:
I'm not sure about the side effects of ignoring .to()
calls in the context manager. But they do seem counterproductive to accelerate's way of working, so accelerate should get control of it
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.