transformers feat: Add data class for fsdp config and use it along with argument parser

trafficstars

Feature request

Like the trainer arguments data class https://github.com/huggingface/transformers/blob/2a002d073a337051bdc3fbdc95ff1bc0399ae2bb/src/transformers/training_args.py#L167

Its good to have a data class for fsdp config as it allows for validation even before stepping into starting a training with ambiguous types used in the config.

Motivation

For instance, the fsdp config json file might have been written as

{
...
"xla": "False" // or "false"
...
}

This check (https://github.com/huggingface/transformers/blob/745bbfe4bb2b61491dedd56e1e8ee4af8ef1a9ec/src/transformers/training_args.py#L1768) might pass through. This might be harder to debug though the problem is due a simple type bug. Having a fsdp arguments data class with parser would help identify such bugs early.

Your contribution

I would be glad to raise a PR based on the discussion on this issue.

Mar 06 '24 06:03 kmehant

FYI @muellerzr

Mar 07 '24 10:03 ArthurZucker

@ArthurZucker @muellerzr Any thoughts?

Aug 22 '24 04:08 kmehant

I suppose we can do so. i think it might be better to have users use the raw accelerate FSDP plugin here, let me think on this!

Aug 22 '24 12:08 muellerzr

A dataclass similar to the AcceleratorConfig would be the way to go here. @kmehant would you like to take a stab at it? :)

Aug 22 '24 12:08 muellerzr

Sure @muellerzr !

Aug 22 '24 13:08 kmehant

https://github.com/huggingface/transformers/issues/29476#issuecomment-2304563418

@muellerzr just trying to understand, is there any reason for not simply importing the FSDP plugin data class (like you mentioned here - https://github.com/huggingface/transformers/issues/29476#issuecomment-2304553034) from accelerate and use it here? Creating a AcceleratorConfig similar dataclass would lead to duplication of work across transformers and accelerate libraries are we okay with that? Thanks

Sep 02 '24 10:09 kmehant

I suppose there actually wouldn't be a harm in just importing the FSDP one. My only fear would be how easy it is to pull up the docs/get to the source by adding yet another item that's in the accelerate lib users in transformers need to know about. But if we allow the fsdp config and keep its docs aligned with the accelerate ones (plus the few extra that are specific to trainer like the xla bits) we should be fine.

For that specific reason is why I suggested a different dataclass, since they are not 1:1

Sep 02 '24 11:09 muellerzr

sure got it @muellerzr will be back with a PR thanks.

Sep 02 '24 11:09 kmehant

@muellerzr Have raised a PR for your review - https://github.com/huggingface/transformers/pull/33278

Have added code comments as well.

Sep 03 '24 13:09 kmehant

transformers transformers copied to clipboard

feat: Add data class for fsdp config and use it along with argument parser

Feature request

Motivation

Your contribution

transformers
transformers copied to clipboard