Server response errors/exceptions not propagating to the top-level client

Open pjin-nvidia opened this issue 1 month ago • 0 comments

Describe the bug

For example, suppose that during a NeMo RL + Gym run, an LLM-as-judge resources server is making calls to a judge model remote endpoint. Now, this remote endpoint starts returning rate limiting responses (HTTP 429 Too Many Requests).

Currently, these HTTP 429 errors are completely invisible from the NeMo RL + Gym driver log, and the observed behavior is that the training run gets stuck in the rollout phase.

On the other hand, by adding extra stdout/stderr debug logging to the VLLM responses API model server (https://github.com/NVIDIA-NeMo/Gym/pull/311/files), it is possible to see these HTTP 429 errors:

Hit a 429 trying to query an OpenAI endpoint (try 1). Sleeping 0.5s. Error message: {"status":429,"title":"Too Many Requests"}
🚨 Caught an exception printed above in judge_model (VLLMModel). If you expect this to be fed back into this model, the exception repr i.e. `repr(e)` is returned to the model. However, please make sure this exception is caught in your server and returned to the model as appropriate. See https://fastapi.tiangolo.com/tutorial/handling-errors/#use-httpexception
INFO:     127.0.0.1:18682 - "POST /v1/responses HTTP/1.1" 500 Internal Server Error

So, the problem is that the leaf-level errors or exceptions are not getting propagated up to the top-level server client.

Steps/Code to reproduce bug

Expected behavior

Configs

Additional context

Nov 13 '25 22:11 pjin-nvidia