NeMo-Guardrails icon indicating copy to clipboard operation
NeMo-Guardrails copied to clipboard

bug: Comments do not act as bot instructions or alter response in any way

Open icsy7867 opened this issue 4 months ago • 0 comments

Did you check docs and existing issues?

  • [x] I have read all the NeMo-Guardrails docs
  • [x] I have updated the package to the latest version before submitting this issue
  • [x] (optional) I have used the develop branch
  • [x] I have searched the existing issues of NeMo-Guardrails

Python version (python --version)

Python 3.12

Operating system/version

RHEL9 - Kubernetes Cluster

NeMo-Guardrails version (if you must use a specific version and not the latest

0.14.1

Describe the bug

I am pretty sure this is user error, but i was not sure where else to ask/post. I am sorry if this is not the correct avenue.

In an attempt to learn how to use the tool, I am trying some of the more simple examples. In general things seem to work, but I cannot for the life of me get the functionality to work where adding comments above a bot statement would additional instruction to the LLM ( https://docs.nvidia.com/nemo/guardrails/latest/user-guides/advanced/bot-message-instructions.html )

So I have a config.yml

models:
  - type: main
    engine: vllm_openai
    reasoning_config:
      remove_reasoning_traces: True
      start_token: "<think>"
      end_token: "</think>"
    parameters:
      openai_api_base: "https://litellm.company.co/v1"
      model_name: "Qwen3-235B-A22B-FP8-dynamic"
      openai_api_key: "XXXXXXXXXXXXXXXXXX"
  - type: content_safety
    engine: vllm_openai
    parameters:
      openai_api_base: "https://litellm.company.co/v1"
      model_name: "meta-llama/Llama-4-Scout-17B-16E"
      openai_api_key: "XXXXXXXXXXXXXXXXXXXXXXXXXXX"
rails:
  input:
    flows:
      - content safety check input $model=content_safety
  output:
    flows:
      - content safety check output $model=content_safety

And a prompt.yml I stole from one of the examples:

# These are the default prompts released by Meta, except for policy O7, which was added to address direct insults.
prompts:
  - task: content_safety_check_input $model=content_safety
    content: |
      Task: Check if there is unsafe content in the user or agent messages in the below conversation in accordance with our safety policy with the below categories. Provide a 'safe' or 'unsafe' rating for both the user message and the agent response. If there is unsafe content, please also provide a list of violated categories according to our safety policy below.

      <BEGIN UNSAFE CONTENT CATEGORIES>
      S1: Violence.
      S2: Sexual.
      S3: Criminal Planning/Confessions.
      S4: Guns and Illegal Weapons.
      S5: Controlled/Regulated Substances.
      S6: Suicide and Self Harm.
      S7: Sexual (minor).
      S8: Hate/Identity Hate.
      S9: PII/Privacy.
      S10: Harassment.
      S11: Threat.
      S12: Profanity.
      S13: Needs Caution.
      S14: Other.
      S15: Manipulation.
      S16: Fraud/Deception.
      S17: Malware.
      S18: High Risk Gov Decision Making.
      S19: Political/Misinformation/Conspiracy.
      S20: Copyright/Trademark/Plagiarism.
      S21: Unauthorized Advice.
      S22: Illegal Activity.
      S23: Immoral/Unethical.
      <END UNSAFE CONTENT CATEGORIES>

      <BEGIN CONVERSATION>

      user: {{ user_input }}

      <END CONVERSATION>

      Provide your safety assessment for the above conversation in the below output JSON format:
      {"User Safety": safe or unsafe rating for the user message, "Response Safety": safe or unsafe rating for the agent response. Omit if no agent response present. "Safety Categories": a comma-separated list of applicable safety categories from the provided taxonomy. Omit if all safe.}

      Do not include anything other than the output JSON in your response.

  - task: content_safety_check_output $model=content_safety
    content: |
      Task: Check if there is unsafe content in the user or agent messages in the below conversation in accordance with our safety policy with the below categories. Provide a 'safe' or 'unsafe' rating for both the user message and the agent response. If there is unsafe content, please also provide a list of violated categories according to our safety policy below.

      <BEGIN UNSAFE CONTENT CATEGORIES>
      S1: Violence.
      S2: Sexual.
      S3: Criminal Planning/Confessions.
      S4: Guns and Illegal Weapons.
      S5: Controlled/Regulated Substances.
      S6: Suicide and Self Harm.
      S7: Sexual (minor).
      S8: Hate/Identity Hate.
      S9: PII/Privacy.
      S10: Harassment.
      S11: Threat.
      S12: Profanity.
      S13: Needs Caution.
      S14: Other.
      S15: Manipulation.
      S16: Fraud/Deception.
      S17: Malware.
      S18: High Risk Gov Decision Making.
      S19: Political/Misinformation/Conspiracy.
      S20: Copyright/Trademark/Plagiarism.
      S21: Unauthorized Advice.
      S22: Illegal Activity.
      S23: Immoral/Unethical.
      <END UNSAFE CONTENT CATEGORIES>

      <BEGIN CONVERSATION>

      user: {{ user_input }}

      response: agent: {{ bot_response }}

      <END CONVERSATION>

      Provide your safety assessment for the above conversation in the below output JSON format:
      {"User Safety": safe or unsafe rating for the user message, "Response Safety": safe or unsafe rating for the agent response. Omit if no agent response present. "Safety Categories": a comma-separated list of applicable safety categories from the provided taxonomy. Omit if all safe.}

      Do not include anything other than the output JSON in your response.
      Output JSON:

And lastly my rails.co file:

define user express greeting
  "Hello"
  "Hi"

define bot express greeting
  "Hello world! How are you?"
  
define flow
  user express greeting
  # Respond in a very formal way and introduce yourself.
  bot express greeting

Steps To Reproduce

  1. Build the nemo guardrails container like the docker instructions specify.
  2. Deploy to docker/podman/kubernetes cluster with a proxy/ingress
  3. Add the above file.
  4. Go to the guardrails chat instance and say "Hi"

Expected Behavior

"Hello world! How are you?" but in a more formal way and have the bot introduce itself.

Actual Behavior

I only get back: "Hello world! How are you?"

icsy7867 avatar Aug 02 '25 01:08 icsy7867