Server response errors/exceptions not propagating to the top-level client
Describe the bug
For example, suppose that during a NeMo RL + Gym run, an LLM-as-judge resources server is making calls to a judge model remote endpoint. Now, this remote endpoint starts returning rate limiting responses (HTTP 429 Too Many Requests).
Currently, these HTTP 429 errors are completely invisible from the NeMo RL + Gym driver log, and the observed behavior is that the training run gets stuck in the rollout phase.
On the other hand, by adding extra stdout/stderr debug logging to the VLLM responses API model server (https://github.com/NVIDIA-NeMo/Gym/pull/311/files), it is possible to see these HTTP 429 errors:
Hit a 429 trying to query an OpenAI endpoint (try 1). Sleeping 0.5s. Error message: {"status":429,"title":"Too Many Requests"}
🚨 Caught an exception printed above in judge_model (VLLMModel). If you expect this to be fed back into this model, the exception repr i.e. `repr(e)` is returned to the model. However, please make sure this exception is caught in your server and returned to the model as appropriate. See https://fastapi.tiangolo.com/tutorial/handling-errors/#use-httpexception
INFO: 127.0.0.1:18682 - "POST /v1/responses HTTP/1.1" 500 Internal Server Error
So, the problem is that the leaf-level errors or exceptions are not getting propagated up to the top-level server client.
Steps/Code to reproduce bug
Expected behavior
Configs
Additional context