ray
ray copied to clipboard
[Train/AIR] Training large models with HuggingfaceTrainer
Huggingface has built-in support for sharded data parallel training allowing developers to train large models.
We should look into allowing sharded data parallel in our Huggingface+Ray AIR integration. Either supporting fsdp or Deepspeed via the HuggingfaceTrainer.
FSDP is already supported out of the box. Unsure if we need to support deepspeed?
FSDP is already supported out of the box. Unsure if we need to support deepspeed?
we need deepspeed for sure
FSDP is already supported out of the box. Unsure if we need to support deepspeed?
we need deepspeed for sure
@dumpmemory Can you elaborate? What's your usecase?
@dumpmemory Can you elaborate? What's your usecase?
usually, we use deep speed's Zero 2 or 3 to train large model and for the small one , we also use deepspeed's fb16 to train model.
@dumpmemory what are the advantages of deepspeed vs. fsdp?
@dumpmemory what are the advantages of deepspeed vs. fsdp?
deepspeed support more parallel options