ray [Train/AIR] Training large models with HuggingfaceTrainer

[Train/AIR] Training large models with HuggingfaceTrainer

Open amogkam opened this issue 2 years ago • 4 comments

Huggingface has built-in support for sharded data parallel training allowing developers to train large models.

We should look into allowing sharded data parallel in our Huggingface+Ray AIR integration. Either supporting fsdp or Deepspeed via the HuggingfaceTrainer.

Aug 30 '22 17:08 amogkam

FSDP is already supported out of the box. Unsure if we need to support deepspeed?

Sep 07 '22 22:09 Yard1

FSDP is already supported out of the box. Unsure if we need to support deepspeed?

we need deepspeed for sure

Sep 21 '22 10:09 dumpmemory

FSDP is already supported out of the box. Unsure if we need to support deepspeed?

we need deepspeed for sure

Sep 21 '22 10:09 dumpmemory

@dumpmemory Can you elaborate? What's your usecase?

Sep 21 '22 10:09 Yard1

@dumpmemory Can you elaborate? What's your usecase?

usually, we use deep speed's Zero 2 or 3 to train large model and for the small one , we also use deepspeed's fb16 to train model.

Sep 26 '22 02:09 dumpmemory

@dumpmemory what are the advantages of deepspeed vs. fsdp?

Oct 31 '22 17:10 amogkam

@dumpmemory what are the advantages of deepspeed vs. fsdp?

deepspeed support more parallel options

Dec 28 '22 09:12 dumpmemory

ray ray copied to clipboard

[Train/AIR] Training large models with HuggingfaceTrainer

ray
ray copied to clipboard