Xuekai Zhu issues

Results 8 issues of


                                            Xuekai Zhu

Data is not available

hi ! if possible, could you please send me a copy of dataset? I am working on same topic of this project. Thank you very much! Email: [email protected]

### 🐛 Describe the bug [E ProcessGroupNCCL.cpp:737] [Rank 0] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=3424, OpType=ALLGATHER, Timeout(ms)=1800000) ran for 1803741 milliseconds before timing out. [E ProcessGroupNCCL.cpp:414] Some NCCL operations have...

bug

Break at 1 epoch "Training epoch complete", can't pretraining beyond 1 epoch ?

### 🐛 Describe the bug File :OLMo/olmo/train.py In the following training loop, we will break our pre-training for only 1 epoch ? ``` @property def max_epochs(self) -> int: if isinstance(self.cfg.max_duration,...

type/bug

Wrong labels in "A_agrees"、"rot-agree"

Hallo！Where can find your the relationship between "A_agrees"、"rot-agree" in CSV header and "Reply Alignment"、“Global Consensus” in your paper? > i found some expamlpes in Table 5 in your paper labeled...

Initial Loss increased from 10 (0.3.0 v) to 60 (0.4.0) !

### 🐛 Describe the bug There is a significant discrepancy in the initial loss values between different versions of olmo and the presence or absence of the step-738020 checkpoint. This...

type/bug

Can I contribute FlowRL - a new RL algorithm for LLM reasoning?

Hi maintainers, I would like to contribute **FlowRL**, a new RL algorithm for LLM reasoning that uses **distribution matching** instead of **reward maximization**. ### Key idea - Uses distribution matching...

[worker, trainer, recipe] feat: add FP16 training and inference support

This PR adds FP16 (float16 precision) training support to verl. The implementation includes: | Component | Precision | |-----------|-----------| | **Training (Actor)** | float16 | | **Training (Ref)** | float16...

[recipe] Fix FlowRL actor to pure implementation

## Summary This PR refactors the FlowRL actor implementation by removing CISPO-specific features and simplifying to a pure FlowRL trajectory balance objective with importance weight clipping. ## Changes ### Removed...