Yifeng Wang（正经人王同学） comments

Results 85 comments of


                                            Yifeng Wang（正经人王同学）

[Feature Request] make agent can be end-to-end RL trained and evaluated

Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model? https://arxiv.org/pdf/2504.13837 it seems that distillation and more sampling would be better than RLVR when pass@k is bigger

[Feature Request] make agent can be end-to-end RL trained and evaluated

TTRL: Test-Time Reinforcement Learning https://arxiv.org/pdf/2504.16084

[Feature Request] make agent can be end-to-end RL trained and evaluated

retools https://arxiv.org/pdf/2504.11536

[Feature Request] make agent can be end-to-end RL trained and evaluated

Right Question is Already Half the Answer: Fully Unsupervised LLM Reasoning Incentivization https://arxiv.org/pdf/2504.05812

[Feature Request] make agent can be end-to-end RL trained and evaluated

similar core simple principle: ReTool: https://arxiv.org/abs/2504.11536 ReSearch：https://arxiv.org/abs/2503.19470 ReCall：https://github.com/Agent-RL/ReCall “Less is More for Reward Design" is currently a major trend in rewards. By internalizing the tool call and driving reinforcement learning...

[Feature Request] make agent can be end-to-end RL trained and evaluated

reward design insight: https://www.bespokelabs.ai/blog/improving-multi-turn-tool-use-with-reinforcement-learning

[Feature Request] make agent can be end-to-end RL trained and evaluated

combine learning and exploring mind:https://github.com/ElliottYan/LUFFY

[Feature Request] make agent can be end-to-end RL trained and evaluated

OTC: Optimal Tool Calls via Reinforcement Learning https://arxiv.org/pdf/2504.14870

[Feature Request] add Open API to mcp server toolkit and agent

for example , with an agent+toolkit, we may can let user send the api docs and then get the mcp code

[Feature Request] add Parallel running agent mode in workforce

hello @fengju0213 ,for example,when we do research task, we can let the serveral search agent search google\bing\baidu in the meantime