transformers icon indicating copy to clipboard operation
transformers copied to clipboard

feat: Add data class for fsdp config and use it along with argument parser

Open kmehant opened this issue 1 year ago • 5 comments
trafficstars

Feature request

Like the trainer arguments data class https://github.com/huggingface/transformers/blob/2a002d073a337051bdc3fbdc95ff1bc0399ae2bb/src/transformers/training_args.py#L167

Its good to have a data class for fsdp config as it allows for validation even before stepping into starting a training with ambiguous types used in the config.

Motivation

For instance, the fsdp config json file might have been written as

{
...
"xla": "False" // or "false"
...
}

This check (https://github.com/huggingface/transformers/blob/745bbfe4bb2b61491dedd56e1e8ee4af8ef1a9ec/src/transformers/training_args.py#L1768) might pass through. This might be harder to debug though the problem is due a simple type bug. Having a fsdp arguments data class with parser would help identify such bugs early.

Your contribution

I would be glad to raise a PR based on the discussion on this issue.

kmehant avatar Mar 06 '24 06:03 kmehant

FYI @muellerzr

ArthurZucker avatar Mar 07 '24 10:03 ArthurZucker

@ArthurZucker @muellerzr Any thoughts?

kmehant avatar Aug 22 '24 04:08 kmehant

I suppose we can do so. i think it might be better to have users use the raw accelerate FSDP plugin here, let me think on this!

muellerzr avatar Aug 22 '24 12:08 muellerzr

A dataclass similar to the AcceleratorConfig would be the way to go here. @kmehant would you like to take a stab at it? :)

muellerzr avatar Aug 22 '24 12:08 muellerzr

Sure @muellerzr !

kmehant avatar Aug 22 '24 13:08 kmehant

https://github.com/huggingface/transformers/issues/29476#issuecomment-2304563418

@muellerzr just trying to understand, is there any reason for not simply importing the FSDP plugin data class (like you mentioned here - https://github.com/huggingface/transformers/issues/29476#issuecomment-2304553034) from accelerate and use it here? Creating a AcceleratorConfig similar dataclass would lead to duplication of work across transformers and accelerate libraries are we okay with that? Thanks

kmehant avatar Sep 02 '24 10:09 kmehant

I suppose there actually wouldn't be a harm in just importing the FSDP one. My only fear would be how easy it is to pull up the docs/get to the source by adding yet another item that's in the accelerate lib users in transformers need to know about. But if we allow the fsdp config and keep its docs aligned with the accelerate ones (plus the few extra that are specific to trainer like the xla bits) we should be fine.

For that specific reason is why I suggested a different dataclass, since they are not 1:1

muellerzr avatar Sep 02 '24 11:09 muellerzr

sure got it @muellerzr will be back with a PR thanks.

kmehant avatar Sep 02 '24 11:09 kmehant

@muellerzr Have raised a PR for your review - https://github.com/huggingface/transformers/pull/33278

Have added code comments as well.

kmehant avatar Sep 03 '24 13:09 kmehant