Xuekai Zhu

Results 8 issues of Xuekai Zhu

hi ! if possible, could you please send me a copy of dataset? I am working on same topic of this project. Thank you very much! Email: [email protected]

### 🐛 Describe the bug [E ProcessGroupNCCL.cpp:737] [Rank 0] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=3424, OpType=ALLGATHER, Timeout(ms)=1800000) ran for 1803741 milliseconds before timing out. [E ProcessGroupNCCL.cpp:414] Some NCCL operations have...

bug

### 🐛 Describe the bug File :OLMo/olmo/train.py In the following training loop, we will break our pre-training for only 1 epoch ? ``` @property def max_epochs(self) -> int: if isinstance(self.cfg.max_duration,...

type/bug

Hallo!Where can find your the relationship between "A_agrees"、"rot-agree" in CSV header and "Reply Alignment"、“Global Consensus” in your paper? > i found some expamlpes in Table 5 in your paper labeled...

### 🐛 Describe the bug There is a significant discrepancy in the initial loss values between different versions of olmo and the presence or absence of the step-738020 checkpoint. This...

type/bug

Hi maintainers, I would like to contribute **FlowRL**, a new RL algorithm for LLM reasoning that uses **distribution matching** instead of **reward maximization**. ### Key idea - Uses distribution matching...

This PR adds FP16 (float16 precision) training support to verl. The implementation includes: | Component | Precision | |-----------|-----------| | **Training (Actor)** | float16 | | **Training (Ref)** | float16...

## Summary This PR refactors the FlowRL actor implementation by removing CISPO-specific features and simplifying to a pure FlowRL trajectory balance objective with importance weight clipping. ## Changes ### Removed...