lightning-thunder icon indicating copy to clipboard operation
lightning-thunder copied to clipboard

[fsdp] Add option of AllGather rate limiting (mainly for ZERO2)

Open crcrpar opened this issue 1 year ago • 0 comments

Adding the new argument of apply_rate_limting to thunder.distributed.fsdp so that we can try rate limiting of AllGather, for especially when ZERO2 is used.

The major changes this pr brings are options of (a) trying rate limiting with zero2 and (b) turning off it for zero3.

Previously I tried this with llama-2-7b-hf-ish models with zero2 but the perf gain wasn't seen. I however think it might make sense to have an option to switch on/off it.

crcrpar avatar Apr 24 '24 02:04 crcrpar