NVIDIA Safety Provider Calling Wrong Guardrails Endpoint

Open Hadar301 opened this issue 1 month ago • 0 comments

System Info

LlamaStack Version: 0.4.0.dev0
Distribution: nvidia
Provider: remote::nvidia safety provider
Guardrails Service: NeMo Guardrails 0.10.x

Information

[x] The official example scripts
[ ] My own modified scripts

🐛 Describe the bug

The NVIDIA safety provider in LlamaStack is calling the wrong endpoint when communicating with NeMo Guardrails service, causing safety/shield functionality to fail with 500 Internal Server Error.

Steps to Reproduce

Configure LlamaStack with nvidia safety provider:

providers:
  safety:
    - provider_id: nvidia
      provider_type: remote::nvidia
      config:
        guardrails_service_url: http://nemoguardrails-sample:8000
        config_id: demo-self-check-input-output
        model: meta/llama-3.2-1b-instruct

curl -X POST http://localhost:8321/v1/shields \
  -H "Content-Type: application/json" \
  -d '{
    "shield_id": "demo-self-check-input-output",
    "provider_id": "nvidia",
    "provider_shield_id": "demo-self-check-input-output",
    "params": {"model": "meta/llama-3.2-1b-instruct"}
  }'

Create guardrails config in NeMo Guardrails service:

curl -X POST http://guardrails-service:8000/v1/guardrail/configs \
  -H "Content-Type: application/json" \
  -d '{
    "name": "demo-self-check-input-output",
    "namespace": "default",
    "data": {
      "prompts": [...],
      "rails": {...}
    }
  }'

Try to run shield via LlamaStack API:

curl -X POST http://localhost:8321/v1/safety/run-shield \
  -H "Content-Type: application/json" \
  -d '{
    "shield_id": "demo-self-check-input-output",
    "messages": [{"role": "user", "content": "You are stupid"}]
  }'

Error logs

ERROR 2025-11-19 15:18:45,931 llama_stack.core.server.server:285 core::server: Error executing endpoint
route='/v1/safety/run-shield' method='post'
â•â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€ Traceback (most recent call last) â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â•® â”‚ /workspace/src/llama_stack/core/server/server.py:275 in route_handler â”‚ â”‚ â”‚ â”‚ 272 â”‚ â”‚ â”‚ â”‚ â”‚ return StreamingResponse(gen, media_type="text/event-stream") â”‚ â”‚ 273 â”‚ â”‚ â”‚ â”‚ else: â”‚ â”‚ 274 â”‚ â”‚ â”‚ â”‚ â”‚ value = func(**kwargs) â”‚ â”‚ â± 275 â”‚ â”‚ â”‚ â”‚ â”‚ result = await maybe_await(value) â”‚ â”‚ 276 â”‚ â”‚ â”‚ â”‚ â”‚ if isinstance(result, PaginatedResponse) and result.url is None: â”‚ â”‚ 277 â”‚ â”‚ â”‚ â”‚ â”‚ â”‚ result.url = route â”‚ â”‚ 278 â”‚ â”‚ â”‚ â”‚ /workspace/src/llama_stack/core/server/server.py:197 in maybe_await â”‚ â”‚ â”‚ â”‚ 194 â”‚ â”‚ 195 async def maybe_await(value): â”‚ â”‚ 196 â”‚ if inspect.iscoroutine(value): â”‚ â”‚ â± 197 â”‚ â”‚ return await value â”‚ â”‚ 198 â”‚ return value â”‚ â”‚ 199 â”‚ â”‚ 200 â”‚ â”‚ â”‚ â”‚ /workspace/src/llama_stack/core/telemetry/trace_protocol.py:103 in async_wrapper â”‚ â”‚ â”‚ â”‚ 100 â”‚ â”‚ â”‚ â”‚ â”‚ 101 â”‚ â”‚ â”‚ with tracing.span(f"{class_name}.{method_name}", span_attributes) as span: â”‚ â”‚ 102 â”‚ â”‚ â”‚ â”‚ try: â”‚ â”‚ â± 103 â”‚ â”‚ â”‚ â”‚ â”‚ result = await method(self, *args, **kwargs) â”‚ â”‚ 104 â”‚ â”‚ â”‚ â”‚ â”‚ span.set_attribute("output", serialize_value(result)) â”‚ â”‚ 105 â”‚ â”‚ â”‚ â”‚ â”‚ return result â”‚ â”‚ 106 â”‚ â”‚ â”‚ â”‚ except Exception as e: â”‚ â”‚ â”‚ â”‚ /workspace/src/llama_stack/core/routers/safety.py:60 in run_shield â”‚ â”‚ â”‚ â”‚ 57 â”‚ ) -> RunShieldResponse: â”‚ â”‚ 58 â”‚ â”‚ logger.debug(f"SafetyRouter.run_shield: {shield_id}") â”‚ â”‚ 59 â”‚ â”‚ provider = await self.routing_table.get_provider_impl(shield_id) â”‚ â”‚ â± 60 â”‚ â”‚ return await provider.run_shield( â”‚ â”‚ 61 â”‚ â”‚ â”‚ shield_id=shield_id, â”‚ â”‚ 62 â”‚ â”‚ â”‚ messages=messages, â”‚ â”‚ 63 â”‚ â”‚ â”‚ params=params, â”‚ â”‚ â”‚ â”‚ /workspace/src/llama_stack/core/telemetry/trace_protocol.py:103 in async_wrapper â”‚ â”‚ â”‚ â”‚ 100 â”‚ â”‚ â”‚ â”‚ â”‚ 101 â”‚ â”‚ â”‚ with tracing.span(f"{class_name}.{method_name}", span_attributes) as span: â”‚ â”‚ 102 â”‚ â”‚ â”‚ â”‚ try: â”‚ â”‚ â± 103 â”‚ â”‚ â”‚ â”‚ â”‚ result = await method(self, *args, **kwargs) â”‚ â”‚ 104 â”‚ â”‚ â”‚ â”‚ â”‚ span.set_attribute("output", serialize_value(result)) â”‚ â”‚ 105 â”‚ â”‚ â”‚ â”‚ â”‚ return result â”‚ â”‚ 106 â”‚ â”‚ â”‚ â”‚ except Exception as e: â”‚ â”‚ â”‚ â”‚ /workspace/src/llama_stack/providers/remote/safety/nvidia/nvidia.py:67 in run_shield â”‚ â”‚ â”‚ â”‚ 64 â”‚ â”‚ â”‚ raise ValueError(f"Shield {shield_id} not found") â”‚ â”‚ 65 â”‚ â”‚ â”‚ â”‚ 66 â”‚ â”‚ self.shield = NeMoGuardrails(self.config, shield.shield_id) â”‚ â”‚ â± 67 â”‚ â”‚ return await self.shield.run(messages) â”‚ â”‚ 68 â”‚ â”‚ â”‚ 69 â”‚ async def run_moderation(self, input: str | list[str], model: str | None = None) -> â”‚ â”‚ ModerationObject: â”‚ â”‚ 70 â”‚ â”‚ raise NotImplementedError("NVIDIA safety provider currently does not implement â”‚ â”‚ run_moderation") â”‚ â”‚ â”‚ â”‚ /workspace/src/llama_stack/providers/remote/safety/nvidia/nvidia.py:147 in run â”‚ â”‚ â”‚ â”‚ 144 â”‚ â”‚ â”‚ â”‚ "config_id": self.config_id, â”‚ â”‚ 145 â”‚ â”‚ â”‚ }, â”‚ â”‚ 146 â”‚ â”‚ } â”‚ â”‚ â± 147 â”‚ â”‚ response = await self._guardrails_post(path="/v1/guardrail/checks", â”‚ â”‚ data=request_data) â”‚ â”‚ 148 â”‚ â”‚ â”‚ â”‚ 149 â”‚ â”‚ if response["status"] == "blocked": â”‚ â”‚ 150 â”‚ â”‚ â”‚ user_message = "Sorry I cannot do this." â”‚ â”‚ â”‚ â”‚ /workspace/src/llama_stack/providers/remote/safety/nvidia/nvidia.py:117 in _guardrails_post â”‚ â”‚ â”‚ â”‚ 114 â”‚ â”‚ â”‚ "Accept": "application/json", â”‚ â”‚ 115 â”‚ â”‚ } â”‚ â”‚ 116 â”‚ â”‚ response = requests.post(url=f"{self.guardrails_service_url}{path}", â”‚ â”‚ headers=headers, json=data) â”‚ â”‚ â± 117 â”‚ â”‚ response.raise_for_status() â”‚ â”‚ 118 â”‚ â”‚ return response.json() â”‚ â”‚ 119 â”‚ â”‚ â”‚ 120 â”‚ async def run(self, messages: list[OpenAIMessageParam]) -> RunShieldResponse: â”‚ â”‚ â”‚ â”‚ /usr/local/lib/python3.12/site-packages/requests/models.py:1026 in raise_for_status â”‚ â”‚ â”‚ â”‚ 1023 â”‚ â”‚ â”‚ ) â”‚ â”‚ 1024 â”‚ â”‚ â”‚ â”‚ 1025 â”‚ â”‚ if http_error_msg: â”‚ â”‚ â± 1026 â”‚ â”‚ â”‚ raise HTTPError(http_error_msg, response=self) â”‚ â”‚ 1027 â”‚ â”‚ â”‚ 1028 â”‚ def close(self): â”‚ â”‚ 1029 â”‚ â”‚ """Releases the connection back to the pool. Once this method has been â”‚ â•°â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â•¯ HTTPError: 500 Server Error: Internal Server Error for url:
http://nemoguardrails-sample.hacohen-nemo.svc.cluster.local:8000/v1/guardrail/checks

Expected behavior

The safety provider should successfully communicate with the NeMo Guardrails service and return a safety response indicating whether the content should be blocked.

Nov 19 '25 15:11 Hadar301