Vim icon indicating copy to clipboard operation
Vim copied to clipboard

Has anyone tried utilizing FSDP (Fully Sharded Data Parallel) for Vim?

Open chokevin8 opened this issue 7 months ago • 0 comments

I wonder if anyone has tried an implementation of FSDP, it would help train larger Vim models for larger datasets since FSDP will shard the models and its parameters across nodes/GPUs as well, while DDP doesn't. I am aware that FSDP is specifically optimized for Transformers, so I was wondering if anyone has an implementation or knows of one. Thanks!

chokevin8 avatar Jul 01 '24 02:07 chokevin8