liuqianchao

Results 4 comments of liuqianchao

Do we have any ETA to finish this PR? Sequence Parallelism is quite important for lots of long context LLM task training.

> The training code has been released. @hongyanz Can you help update the `cnets.py` code to make it compatible with the Qwen model?

@aoxy hi, any update on the merge work? We’ve recently run into low training efficiency when doing RL training with gpt-oss because sink attention support is inconsistent between training and...

how is the progress of support gpt-oss