Xia Yu
Results
3
issues of
Xia Yu
### Feature request Qwen2.5-Math and Qwen2.5-Code are two state-of-the-art models that have recently integrated GRPO (Group Relative Policy Optimization) ### Motivation https://qwenlm.github.io/blog/qwen2.5-math/ https://[arxiv.org/pdf/2402.03300](https://arxiv.org/pdf/2402.03300) ### Your contribution This is a request-only...
✨ enhancement
We've observed that UI-TARS-1.5-7B agent frequently gets stuck in endless retry loops when interacting with web interfaces. The agent repeatedly attempts the same ineffective action, unable to adapt or learn...