transformers
transformers copied to clipboard
feat: Add data class for fsdp config and use it along with argument parser
Feature request
Like the trainer arguments data class https://github.com/huggingface/transformers/blob/2a002d073a337051bdc3fbdc95ff1bc0399ae2bb/src/transformers/training_args.py#L167
Its good to have a data class for fsdp config as it allows for validation even before stepping into starting a training with ambiguous types used in the config.
Motivation
For instance, the fsdp config json file might have been written as
{
...
"xla": "False" // or "false"
...
}
This check (https://github.com/huggingface/transformers/blob/745bbfe4bb2b61491dedd56e1e8ee4af8ef1a9ec/src/transformers/training_args.py#L1768) might pass through. This might be harder to debug though the problem is due a simple type bug. Having a fsdp arguments data class with parser would help identify such bugs early.
Your contribution
I would be glad to raise a PR based on the discussion on this issue.
FYI @muellerzr
@ArthurZucker @muellerzr Any thoughts?
I suppose we can do so. i think it might be better to have users use the raw accelerate FSDP plugin here, let me think on this!
A dataclass similar to the AcceleratorConfig would be the way to go here. @kmehant would you like to take a stab at it? :)
Sure @muellerzr !
https://github.com/huggingface/transformers/issues/29476#issuecomment-2304563418
@muellerzr just trying to understand, is there any reason for not simply importing the FSDP plugin data class (like you mentioned here - https://github.com/huggingface/transformers/issues/29476#issuecomment-2304553034) from accelerate and use it here? Creating a AcceleratorConfig similar dataclass would lead to duplication of work across transformers and accelerate libraries are we okay with that? Thanks
I suppose there actually wouldn't be a harm in just importing the FSDP one. My only fear would be how easy it is to pull up the docs/get to the source by adding yet another item that's in the accelerate lib users in transformers need to know about. But if we allow the fsdp config and keep its docs aligned with the accelerate ones (plus the few extra that are specific to trainer like the xla bits) we should be fine.
For that specific reason is why I suggested a different dataclass, since they are not 1:1
sure got it @muellerzr will be back with a PR thanks.
@muellerzr Have raised a PR for your review - https://github.com/huggingface/transformers/pull/33278
Have added code comments as well.