transformers icon indicating copy to clipboard operation
transformers copied to clipboard

[deepspeed zero3] need `generate(synced_gpus=True, ...)`

Open stas00 opened this issue 2 years ago • 7 comments

As discussed in https://github.com/huggingface/transformers/issues/22231 generate under deepspeed zero3 using different input streams on different gpus may hang. It's documented here and in the API docs, that synced_gpus=True is required but who reads the docs.

So this PR will automatically turn this flag on under ZeRO Stage-3, so everything works out of box and it'll warn the user once for their awareness.

Fixes: https://github.com/huggingface/transformers/issues/22231

stas00 avatar Mar 18 '23 05:03 stas00

The documentation is not available anymore as the PR was closed or merged.

So a potential issue I see is that deepspeed is potentially not enabled on all modele, & this would enable synced_gpus even for models where it's not enabled? depends on how is_deepspeed_zero3_enabled works, which you likely know better than me

JulesGM avatar Mar 18 '23 21:03 JulesGM

  • For Accelerate and HF Trainer everything is done automatically for you.
  • If you build your own trainer and follow the instructions it'll work as well.

stas00 avatar Mar 19 '23 01:03 stas00

  1. Are you proposing:
def generate(..., synced_gpus=None)
[...]
if synced_gpus == None:
    if is_deepspeed_zero3_enabled() and dist.world_size() > 1:
        synced_gpus = True
    else:
        synced_gpus = False

which would preserve BC wrt current synced_gpus=False in the function definition.

yes?

  1. and no warning needed then? or still keeping it?

  2. now docs will be mismatching so will need to adapt those to say that by default with multi-gpu it'll be set to True, but the user can choose to set it to False if they want to.

stas00 avatar Mar 22 '23 16:03 stas00

Yes, your code is exactly what I'm suggesting. I think it would be a better API since the user wouldn't have to look for warnings (no need for a warning indeed in this case) and would preserve backward compatibility as you mention.

sgugger avatar Mar 22 '23 16:03 sgugger

That sounds good. Thank you for proposing it, Sylvain.

So no warning needed, right? As this logic is really about dynamic default setting and it'll be documented as such.

stas00 avatar Mar 22 '23 17:03 stas00

Yup!

sgugger avatar Mar 22 '23 17:03 sgugger

Thank you for suggesting a more elegant solution than my initial one, Sylvain.

stas00 avatar Mar 22 '23 19:03 stas00

thanks folks, this is great

JulesGM avatar Mar 22 '23 19:03 JulesGM