transformers icon indicating copy to clipboard operation
transformers copied to clipboard

Calling parallelize() on T5ForConditionalGeneration for ByT5 results in device_map error

Open yunyu opened this issue 2 years ago • 2 comments

System Info

4.25.1

Who can help?

@ArthurZucker @younesbelkada

Information

  • [ ] The official example scripts
  • [X] My own modified scripts

Tasks

  • [ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • [X] My own task or dataset (give details below)

Reproduction

model = T5ForConditionalGeneration.from_pretrained("google/byt5-xl")
model.parallelize()

Results in:

The device_map contains more attention blocks than this model has. Remove these from the device_map: {...}

Expected behavior

The model should parallelize attention blocks properly. This is needed because ByT5 has a 3x deeper encoder than decoder, so the same device_map can't be used for both.

yunyu avatar Dec 23 '22 02:12 yunyu

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Jan 22 '23 15:01 github-actions[bot]

Note that the parallelize API is going to be deprecated soon. You should load your model like this to use Accelerate instead:

model = T5ForConditionalGeneration.from_pretrained("google/byt5-xl", device_map="balanced")

sgugger avatar Jan 23 '23 15:01 sgugger

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Feb 17 '23 15:02 github-actions[bot]