accelerate
accelerate copied to clipboard
[bug] infer_auto_device_map
System Info
Copy-and-paste the text below in your GitHub issue
- `Accelerate` version: 0.19.0
- Platform: Linux-5.10.16.3-microsoft-standard-WSL2-x86_64-with-glibc2.31
- Python version: 3.9.16
- Numpy version: 1.24.2
- PyTorch version (GPU?): 1.12.1+cu113 (True)
- System RAM: 3.79 GB
- GPU type: NVIDIA GeForce MX250
- `Accelerate` default config:
Not found
Information
- [ ] The official example scripts
- [X] My own modified scripts
Tasks
- [ ] One of the scripts in the examples/ folder of Accelerate or an officially supported
no_trainerscript in theexamplesfolder of thetransformersrepo (such asrun_no_trainer_glue.py) - [X] My own task or dataset (give details below)
Reproduction
I' trying to learn infer_auto_device_map, so I construct a toy model to trace this function to help understanding, but get an error.
from accelerate.utils.modeling import infer_auto_device_map
class ModelB(nn.Module):
def __init__(self):
super().__init__()
self.layer1 = nn.Linear(3, 3)
self.layer2 = nn.Linear(3, 3)
self.layer2.weight = self.layer1.weight
self.layer2.bias = self.layer1.bias
def forward(self, x):
out = self.layer1(x)
out = self.layer2(out)
return out
model_b = ModelB()
infer_auto_device_map(model_b, verbose=True)
Output:
Treating module layer1.
Found the relevant tied param groups [['layer1.weight', 'layer2.weight'], ['layer1.bias', 'layer2.bias']]
So those parameters need to be taken into account ['layer2.weight', 'layer2.bias']
It looks like layer1 is going to fit on 0 but we have tied parameters to account for.
- Names ['layer2.weight', 'layer2.bias']
- Module names ['layer2', 'layer2']
Putting layer1 and ['layer2', 'layer2'] on 0.
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
Cell In[137], line 18
15 return out
16 model_b = ModelB()
---> 18 infer_auto_device_map(model_b, verbose=True)
File ~/anaconda3/envs/exp/lib/python3.9/site-packages/accelerate/utils/modeling.py:706, in infer_auto_device_map(model, max_memory, no_split_module_classes, dtype, special_dtypes, verbose)
704 device_map[name] = devices[current_device]
705 for tied_module_name in tied_module_names:
--> 706 tied_module_index = [i for i, (n, _) in enumerate(modules_to_treat) if n == tied_module_name][0]
707 modules_to_treat.pop(tied_module_index)
708 device_map[tied_module_name] = devices[current_device]
IndexError: list index out of range
Expected behavior
Maybe it should work well?
I can reproduce. Will try to have a look later today or early next week. Thanks for the report!
Should be fixed by the PR linked abov.