[Rollout timeout] Loss rollout while training
The Error traceback:
File "/home/tiger/.pyenv/versions/3.11.2/lib/python3.11/site-packages/agentlightning/verl/entrypoint.py", line 152, in run
trainer.fit()
File "/home/tiger/.pyenv/versions/3.11.2/lib/python3.11/site-packages/agentlightning/verl/trainer.py", line 318, in fit
metrics = self._train_step(batch_dict)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tiger/.pyenv/versions/3.11.2/lib/python3.11/site-packages/agentlightning/verl/trainer.py", line 95, in _train_step
batch, agent_metrics = self.agent_mode_daemon.get_train_data_batch(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tiger/.pyenv/versions/3.11.2/lib/python3.11/site-packages/agentlightning/verl/daemon.py", line 379, in get_train_data_batch
original_sample = self._task_id_to_original_sample[rollout_id]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^
training log
(Process-11615 agentlightning.server) Requeuing task rollout-85c3e463-cf45-4dae-a765-c5bf6cc59284 after timeout (attempt 1)
(Process-11615 agentlightning.server) Task rollout-85c3e463-cf45-4dae-a765-c5bf6cc59284 timed out after 600.0s, requeued (attempt 1)
(Process-11615 agentlightning.server) Task rollout-85c3e463-cf45-4dae-a765-c5bf6cc59284 re-claimed (attempt 2)
(Process-11615 agentlightning.server) Rollout received and stored: rollout-85c3e463-cf45-4dae-a765-c5bf6cc59284
agent log
[Task 10133 Received] ID: rollout-85c3e463-cf45-4dae-a765-c5bf6cc59284
[Task 10190 Received] ID: rollout-85c3e463-cf45-4dae-a765-c5bf6cc59284
2025-10-27 02:44:52,426 [INFO] (Process-1116 __main__) [Rollout rollout-85c3e463-cf45-4dae-a765-c5bf6cc59284] Message length details:
2025-10-27 02:44:52,426 [INFO] (Process-1116 __main__) Message 0: 2633 characters
2025-10-27 02:44:52,426 [INFO] (Process-1116 __main__) Message 1: 3002 characters
2025-10-27 02:44:52,426 [INFO] (Process-1116 __main__) Message 2: 176 characters
2025-10-27 02:44:52,426 [INFO] (Process-1116 __main__) Message 3: 3013 characters
2025-10-27 02:44:52,426 [INFO] (Process-1116 __main__) Message 4: 323 characters
2025-10-27 02:44:52,426 [INFO] (Process-1116 __main__) Message 5: 4113 characters
2025-10-27 02:44:52,426 [INFO] (Process-1116 __main__) Total: 6 messages, 13260 characters
(Process-1116 agentlightning.runner) [Worker 3 | Rollout rollout-85c3e463-cf45-4dae-a765-c5bf6cc59284] Completed in 25.88s. Triplet length: 4. Reward: 0.0
2025-10-27 02:59:33,022 [INFO] (Process-1113 __main__) [Rollout rollout-85c3e463-cf45-4dae-a765-c5bf6cc59284] Message length details:
2025-10-27 02:59:33,022 [INFO] (Process-1113 __main__) Message 0: 2633 characters
2025-10-27 02:59:33,022 [INFO] (Process-1113 __main__) Message 1: 4985 characters
2025-10-27 02:59:33,022 [INFO] (Process-1113 __main__) Message 2: 265 characters
2025-10-27 02:59:33,022 [INFO] (Process-1113 __main__) Message 3: 3013 characters
2025-10-27 02:59:33,022 [INFO] (Process-1113 __main__) Message 4: 412 characters
2025-10-27 02:59:33,022 [INFO] (Process-1113 __main__) Message 5: 4444 characters
2025-10-27 02:59:33,022 [INFO] (Process-1113 __main__) Total: 6 messages, 15752 characters
(Process-1113 agentlightning.runner) [Worker 0 | Rollout rollout-85c3e463-cf45-4dae-a765-c5bf6cc59284] Completed in 1505.21s. Triplet length: 4. Reward: 0.0
I guess the server raise timeout error bcz agent takes too much time to finish task. I suggest that if time out, just ignore that rollout.
BTW, is there any wechat group or rednote group?
In v0.2, there is a RolloutConfig controlling that behavior.
You can join the discord group, which is on the frontpage of this project.
我可以建一个非官方的微信或者小红书群吗?discord用不太习惯 我在小红书上发了篇讨论 agent lightning 的文章,有2K多阅读和500多赞藏,我也希望大家能在国内的平台更方便地进行讨论~
I'll ask the team if anyone is willing to maintain an official group. Maintaining and refreshing an invitation QR code for a WeChat group would require heavier efforts than Discord group, and it's not friendly to non-Chinese individuals.
Thanks a lot for your efforts in promoting Agent-lightning to a broader community. Please feel free to initiate any unofficial discussion group you feel passionate about.
#236 建好啦!