ray icon indicating copy to clipboard operation
ray copied to clipboard

[Train/AIR] Training large models with HuggingfaceTrainer

Open amogkam opened this issue 2 years ago • 4 comments

Huggingface has built-in support for sharded data parallel training allowing developers to train large models.

We should look into allowing sharded data parallel in our Huggingface+Ray AIR integration. Either supporting fsdp or Deepspeed via the HuggingfaceTrainer.

amogkam avatar Aug 30 '22 17:08 amogkam

FSDP is already supported out of the box. Unsure if we need to support deepspeed?

Yard1 avatar Sep 07 '22 22:09 Yard1

FSDP is already supported out of the box. Unsure if we need to support deepspeed?

we need deepspeed for sure

dumpmemory avatar Sep 21 '22 10:09 dumpmemory

FSDP is already supported out of the box. Unsure if we need to support deepspeed?

we need deepspeed for sure

dumpmemory avatar Sep 21 '22 10:09 dumpmemory

@dumpmemory Can you elaborate? What's your usecase?

Yard1 avatar Sep 21 '22 10:09 Yard1

@dumpmemory Can you elaborate? What's your usecase?

usually, we use deep speed's Zero 2 or 3 to train large model and for the small one , we also use deepspeed's fb16 to train model.

dumpmemory avatar Sep 26 '22 02:09 dumpmemory

@dumpmemory what are the advantages of deepspeed vs. fsdp?

amogkam avatar Oct 31 '22 17:10 amogkam

@dumpmemory what are the advantages of deepspeed vs. fsdp?

deepspeed support more parallel options

dumpmemory avatar Dec 28 '22 09:12 dumpmemory