liuqianchao
liuqianchao
Do we have any ETA to finish this PR? Sequence Parallelism is quite important for lots of long context LLM task training.
> The training code has been released. @hongyanz Can you help update the `cnets.py` code to make it compatible with the Qwen model?
@aoxy hi, any update on the merge work? We’ve recently run into low training efficiency when doing RL training with gpt-oss because sink attention support is inconsistent between training and...
how is the progress of support gpt-oss