Kate Cheng
Kate Cheng
/bot run --disable-fail-fast
/bot run --disable-fail-fast
/bot run --disable-fail-fast
/bot run --disable-fail-fast
@euronymous-aithal I've implemented ray compiled graph and tested on sft algorithm. The original overhead was ~4s for 32 nodes seqlen 48k TP4 CP4 Qwen2.5-14B model. With ray compiled graph, the...