accelerate
accelerate copied to clipboard
infer_auto_device_map calculate bug
System Info
- `Accelerate` version: 0.19.0
- Platform: Linux-5.10.16.3-microsoft-standard-WSL2-x86_64-with-glibc2.31
- Python version: 3.9.16
- Numpy version: 1.24.2
- PyTorch version (GPU?): 1.12.1+cu113 (True)
- System RAM: 3.79 GB
- GPU type: NVIDIA GeForce MX250
- `Accelerate` default config:
Not found
Information
- [ ] The official example scripts
- [X] My own modified scripts
Tasks
- [ ] One of the scripts in the examples/ folder of Accelerate or an officially supported
no_trainerscript in theexamplesfolder of thetransformersrepo (such asrun_no_trainer_glue.py) - [X] My own task or dataset (give details below)
Reproduction
Hi, I have a question about device_map: The key of device_map is always a module's name? I write code like this,
class ModelA(nn.Module):
def __init__(self):
super().__init__()
self.a = nn.Parameter(torch.rand(1000, 1000))
self.b = nn.Parameter(torch.rand(1000, 1000))
self.layer = nn.Linear(1000, 1000)
def forward(self, x):
pass
device_map = infer_auto_device_map(ModelA(), max_memory={"cpu": 1000*1000*6}) # got error
Here are two problems:
- Firstly, the code cann't work, maybe a bug?
- Secondly, it's possible to adjust the
max_memory, such thatself.atocpu,self.bandself.layertodisk.
I also found another interesting case, following the example in the docstring
from transformers import AutoTokenizer, BertGenerationDecoder, BertGenerationConfig
from accelerate import init_empty_weights
from accelerate.utils.modeling import compute_module_sizes
tokenizer = AutoTokenizer.from_pretrained("google/bert_for_seq_generation_L-24_bbc_encoder")
config = BertGenerationConfig.from_pretrained("google/bert_for_seq_generation_L-24_bbc_encoder")
config.is_decoder = True
with init_empty_weights():
model = BertGenerationDecoder(config=config)
model_sizes = compute_module_sizes(model)
max_memory={"cpu": model_sizes["bert"]}
device_map = infer_auto_device_map(
model,
max_memory=max_memory,
verbose=True # when True, get error
)
Here are two problems, too:
- Firstly, when I set
verbose=True, I got an error, maybe a bug? - Secondly, when I set
verbose=False, I found some part ofself.bertstill offload to disk, maybe also a bug?
Expected behavior
Fix bug?
I am not able to reproduce any of the bugs you mention, can you try installing from source?
ok, I tried installing from source.
class ModelA(nn.Module):
def __init__(self):
super().__init__()
self.a = nn.Parameter(torch.rand(1000, 1000))
self.b = nn.Parameter(torch.rand(1000, 1000))
self.layer = nn.Linear(1000, 1000)
def forward(self, x):
pass
print(infer_auto_device_map(ModelA(), max_memory={"cpu": 1000*1000*6})) # {'': 'disk'}
print(infer_auto_device_map(ModelA(), max_memory={"cpu": 1000*1000*10})) # {'a': 'cpu', 'b': 'disk', 'layer': 'disk'}
print(dict(compute_module_sizes(ModelA())))
# {'': 12004000, 'a': 4000000, 'b': 4000000, 'layer': 4004000, 'layer.weight': 4000000, 'layer.bias': 4000}
run successfully, But I think the output device_map should be
{'a': 'cpu', 'b': 'disk', 'layer': 'disk'}{'a': 'cpu', 'b': 'cpu', 'layer': 'disk'}
And the second example about BertGenerationDecoder also runs successfully, but I got some warning,
The model weights are not tied. Please use the `tie_weights` method before using the `infer_auto_device` function.
some part of self.bert still offload to disk, but in this empty model case, model.tie_weights() cannot work well, some suggestions ?
You forget that one parameter takes 4 bits in space. With 100010006 set as max space, you cannot fit your whole model. Also it needs to make sure you will have enough space to reload layers offloaded to the disk.
As for the second warning, you need to tie weights before calling infer_auto_device_map so that the tied weights are set on the same device. That is what the warning is telling you (and you can tie empty weights).
Thanks for your quick reply, I underestimate the complicated logic of infer_auto_device_map in the first example.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.