Added attention layers to wrap for fsdp

Open alvations opened this issue 1 year ago • 1 comments

What does this PR do?

This PR defines the fsdp_transformer_layer_cls_to_wrap value in the Mistral config. This way user can easily load the config to figure out what values to use for FSDP, e.g.

from transformers import AutoConfig
c = AutoConfig.from_pretrained("mistralai/Mistral-7B-v0.1")
c.fsdp_transformer_layer_cls_to_wrap

[out]:

MistralDecoderLayer

Context: Users has been asking when and which layer to wrap, there shouldn't be a need to load the model to figure it out by going through the state_dict of model summary,

Fixes: https://discuss.huggingface.co/t/accelerate-fsdp-config-prompts/21262/3

Currently, this information is also available through https://github.com/huggingface/transformers/blob/main/src/transformers/models/mistral/modeling_mistral.py#L809

Who can review?

Models:

text models: @ArthurZucker and @younesbelkada

Integrations:

deepspeed: HF Trainer/Accelerate: @pacman100

Documentation: @stevhliu and @MKhalusova

May 09 '24 23:05 alvations

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

May 10 '24 00:05 HuggingFaceDocBuilderDev

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Jun 09 '24 08:06 github-actions[bot]