accelerate infer_auto_device

System Info

- `Accelerate` version: 0.19.0
- Platform: Linux-5.10.16.3-microsoft-standard-WSL2-x86_64-with-glibc2.31
- Python version: 3.9.16
- Numpy version: 1.24.2
- PyTorch version (GPU?): 1.12.1+cu113 (True)
- System RAM: 3.79 GB
- GPU type: NVIDIA GeForce MX250
- `Accelerate` default config:
        Not found

Information

[ ] The official example scripts
[X] My own modified scripts

Tasks

[ ] One of the scripts in the examples/ folder of Accelerate or an officially supported no_trainer script in the examples folder of the transformers repo (such as run_no_trainer_glue.py)
[X] My own task or dataset (give details below)

Reproduction

Hi, I have a question about device_map: The key of device_map is always a module's name? I write code like this,

class ModelA(nn.Module):
    def __init__(self):
        super().__init__()
        self.a = nn.Parameter(torch.rand(1000, 1000))
        self.b = nn.Parameter(torch.rand(1000, 1000))
        self.layer = nn.Linear(1000, 1000)
    def forward(self, x):
        pass
device_map = infer_auto_device_map(ModelA(), max_memory={"cpu": 1000*1000*6})  # got error

Here are two problems:

Firstly, the code cann't work, maybe a bug?
Secondly, it's possible to adjust the max_memory, such that self.a to cpu, self.b and self.layer to disk.

I also found another interesting case, following the example in the docstring

from transformers import AutoTokenizer, BertGenerationDecoder, BertGenerationConfig
from accelerate import init_empty_weights
from accelerate.utils.modeling import compute_module_sizes

tokenizer = AutoTokenizer.from_pretrained("google/bert_for_seq_generation_L-24_bbc_encoder")
config = BertGenerationConfig.from_pretrained("google/bert_for_seq_generation_L-24_bbc_encoder")
config.is_decoder = True

with init_empty_weights():
    model = BertGenerationDecoder(config=config)

model_sizes = compute_module_sizes(model)
max_memory={"cpu": model_sizes["bert"]}
device_map = infer_auto_device_map(
    model,
    max_memory=max_memory,
    verbose=True  # when True, get error
)

Here are two problems, too:

Firstly, when I set verbose=True, I got an error, maybe a bug?
Secondly, when I set verbose=False, I found some part of self.bert still offload to disk, maybe also a bug?

Expected behavior

Fix bug?

Jun 07 '23 07:06 BuxianChen

I am not able to reproduce any of the bugs you mention, can you try installing from source?

Jun 07 '23 11:06 sgugger

ok, I tried installing from source.

class ModelA(nn.Module):
    def __init__(self):
        super().__init__()
        self.a = nn.Parameter(torch.rand(1000, 1000))
        self.b = nn.Parameter(torch.rand(1000, 1000))
        self.layer = nn.Linear(1000, 1000)
    def forward(self, x):
        pass

print(infer_auto_device_map(ModelA(), max_memory={"cpu": 1000*1000*6}))  # {'': 'disk'}
print(infer_auto_device_map(ModelA(), max_memory={"cpu": 1000*1000*10}))  # {'a': 'cpu', 'b': 'disk', 'layer': 'disk'}
print(dict(compute_module_sizes(ModelA())))
# {'': 12004000, 'a': 4000000, 'b': 4000000, 'layer': 4004000, 'layer.weight': 4000000, 'layer.bias': 4000}

run successfully, But I think the output device_map should be

{'a': 'cpu', 'b': 'disk', 'layer': 'disk'}
{'a': 'cpu', 'b': 'cpu', 'layer': 'disk'}

And the second example about BertGenerationDecoder also runs successfully, but I got some warning,

The model weights are not tied. Please use the `tie_weights` method before using the `infer_auto_device` function.

some part of self.bert still offload to disk, but in this empty model case, model.tie_weights() cannot work well, some suggestions ?

Jun 07 '23 12:06 BuxianChen

You forget that one parameter takes 4 bits in space. With 100010006 set as max space, you cannot fit your whole model. Also it needs to make sure you will have enough space to reload layers offloaded to the disk.

As for the second warning, you need to tie weights before calling infer_auto_device_map so that the tied weights are set on the same device. That is what the warning is telling you (and you can tie empty weights).

Jun 07 '23 12:06 sgugger

Thanks for your quick reply, I underestimate the complicated logic of infer_auto_device_map in the first example.

Jun 07 '23 12:06 BuxianChen

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Jul 07 '23 15:07 github-actions[bot]

accelerate
accelerate copied to clipboard

infer_auto_device_map calculate bug

System Info

Information

Tasks

Reproduction

Expected behavior

accelerate accelerate copied to clipboard

infer_auto_device_map calculate bug

System Info

Information

Tasks

Reproduction

Expected behavior

accelerate
accelerate copied to clipboard