accelerate icon indicating copy to clipboard operation
accelerate copied to clipboard

[bug] infer_auto_device_map

Open BuxianChen opened this issue 2 years ago • 1 comments

System Info

Copy-and-paste the text below in your GitHub issue

- `Accelerate` version: 0.19.0
- Platform: Linux-5.10.16.3-microsoft-standard-WSL2-x86_64-with-glibc2.31
- Python version: 3.9.16
- Numpy version: 1.24.2
- PyTorch version (GPU?): 1.12.1+cu113 (True)
- System RAM: 3.79 GB
- GPU type: NVIDIA GeForce MX250
- `Accelerate` default config:
        Not found

Information

  • [ ] The official example scripts
  • [X] My own modified scripts

Tasks

  • [ ] One of the scripts in the examples/ folder of Accelerate or an officially supported no_trainer script in the examples folder of the transformers repo (such as run_no_trainer_glue.py)
  • [X] My own task or dataset (give details below)

Reproduction

I' trying to learn infer_auto_device_map, so I construct a toy model to trace this function to help understanding, but get an error.

from accelerate.utils.modeling import infer_auto_device_map
class ModelB(nn.Module):
    def __init__(self):
        super().__init__()
        self.layer1 = nn.Linear(3, 3)
        self.layer2 = nn.Linear(3, 3)
        self.layer2.weight = self.layer1.weight
        self.layer2.bias = self.layer1.bias
        
    def forward(self, x):
        out = self.layer1(x)
        out = self.layer2(out)
        return out
model_b = ModelB()

infer_auto_device_map(model_b, verbose=True)

Output:

Treating module layer1.
  Found the relevant tied param groups [['layer1.weight', 'layer2.weight'], ['layer1.bias', 'layer2.bias']]
  So those parameters need to be taken into account ['layer2.weight', 'layer2.bias']
  It looks like layer1 is going to fit on 0 but we have tied parameters to account for.
  - Names ['layer2.weight', 'layer2.bias']
  - Module names ['layer2', 'layer2']
Putting layer1 and ['layer2', 'layer2'] on 0.
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Cell In[137], line 18
     15         return out
     16 model_b = ModelB()
---> 18 infer_auto_device_map(model_b, verbose=True)

File ~/anaconda3/envs/exp/lib/python3.9/site-packages/accelerate/utils/modeling.py:706, in infer_auto_device_map(model, max_memory, no_split_module_classes, dtype, special_dtypes, verbose)
    704 device_map[name] = devices[current_device]
    705 for tied_module_name in tied_module_names:
--> 706     tied_module_index = [i for i, (n, _) in enumerate(modules_to_treat) if n == tied_module_name][0]
    707     modules_to_treat.pop(tied_module_index)
    708     device_map[tied_module_name] = devices[current_device]

IndexError: list index out of range

Expected behavior

Maybe it should work well?

BuxianChen avatar Jun 02 '23 09:06 BuxianChen

I can reproduce. Will try to have a look later today or early next week. Thanks for the report!

sgugger avatar Jun 02 '23 13:06 sgugger

Should be fixed by the PR linked abov.

sgugger avatar Jun 02 '23 18:06 sgugger