lightllm
lightllm copied to clipboard
Add fake balance for EP mode
Add fake balance for EP mode, which is controled by option of --enable_ep_fake_balance. Cost: EP8 batch128 input64 (40+ different seqlens) totally cost about 5 seconds. Benefit: prefill throughput increase 35%, decoding throughput increase 15%, and the overheads become stable.