transformers
transformers copied to clipboard
Calling parallelize() on T5ForConditionalGeneration for ByT5 results in device_map error
System Info
4.25.1
Who can help?
@ArthurZucker @younesbelkada
Information
- [ ] The official example scripts
- [X] My own modified scripts
Tasks
- [ ] An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - [X] My own task or dataset (give details below)
Reproduction
model = T5ForConditionalGeneration.from_pretrained("google/byt5-xl")
model.parallelize()
Results in:
The device_map contains more attention blocks than this model has. Remove these from the device_map: {...}
Expected behavior
The model should parallelize attention blocks properly. This is needed because ByT5 has a 3x deeper encoder than decoder, so the same device_map can't be used for both.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Note that the parallelize API is going to be deprecated soon. You should load your model like this to use Accelerate instead:
model = T5ForConditionalGeneration.from_pretrained("google/byt5-xl", device_map="balanced")
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.