NeMo-Guardrails
NeMo-Guardrails copied to clipboard
Base Mistral: Guadrail does not respond to prompts of very similar content as outlined in the config
I set up a custom off-topic guardrail and ran the full Mistral model through it. The off-topic examples in the config as below include a piazza dough recipe and weather forecast for Tokyo. When I prompted the model with "What's the weather like in NY next week?" and "How do I make sourdough." it doesn't follow the outlined response.... See the model's response is in the screenshot below.
Config:
import os
from nemoguardrails import LLMRails, RailsConfig
from torch.cuda import device_count
from functools import lru_cache
from torch import float16
from transformers import AutoConfig, AutoModelForCausalLM, AutoTokenizer, pipeline
from nemoguardrails.llm.helpers import get_llm_instance_wrapper
from nemoguardrails.llm.providers import (
HuggingFacePipelineCompatible,
register_llm_provider,
)
import torch
import warnings
warnings.filterwarnings("ignore")
yaml_content = """
models:
- type: main
engine: hf_pipeline_mistral_topic
"""
colang_content = """
define user express greeting
"hello"
"hi"
"what's up"
"hey"
"yo"
define bot express greeting
"Hello and welcome to X LLM. How may I assist you with insights on leadership, trend analysis, and financial acumen drawn from nearly a century of expertise?"
define flow greeting
user express greeting
bot express greeting
define user ask off-topic
"What's the best recipe for homemade pizza dough?"
"Can you provide the latest weather forecast for Tokyo?"
"How do you fix a leaking faucet in the bathroom?"
define bot answer off-topic
"I'm primarily designed to offer insights and information primarily related to finance, business, and world events in line with the X media's content."
define bot offer help
"Is there anything else I can help you with?"
define flow off-topic
user ask off-topic
bot answer off-topic
bot offer help
"""
@lru_cache
def _load_model(model_name, device, num_gpus, debug=False):
"""Helper function to load the model."""
if device == "cpu":
kwargs = {}
elif device == "cuda":
kwargs = {"torch_dtype": float16}
if num_gpus == "auto":
kwargs["device_map"] = "auto"
else:
num_gpus = int(num_gpus)
if num_gpus != 1:
kwargs.update(
{
"device_map": "auto",
"max_memory": {i: "13GiB" for i in range(num_gpus)},
}
)
elif device == "mps":
kwargs = {"torch_dtype": float16}
# Avoid bugs in mps backend by not using in-place operations.
print("mps not supported")
else:
raise ValueError(f"Invalid device: {device}")
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=False)
model = AutoModelForCausalLM.from_pretrained(
model_name, low_cpu_mem_usage=True, **kwargs
)
if device == "cuda" and num_gpus == 1:
model.to(device)
if debug:
print(model)
return model, tokenizer
def get_mistral_7b_llm():
repo_id = "mistralai/Mistral-7B-Instruct-v0.2"
# Adjust parameters as needed for Mistral 7B
params = {"temperature": 0.5, "max_length": 8000, "do_sample": True}
# Using the first GPU
device = 2
# Assuming HuggingFacePipelineCompatible or a similar interface is used for Mistral models
llm = HuggingFacePipelineCompatible.from_model_id(
model_id=repo_id,
device=device,
task="text-generation",
model_kwargs=params
)
return llm
def get_mistral_from_path(model_path):
device = "cuda"
num_gpus = 2 # making sure GPU-GPU are NVlinked, GPUs-GPUS with NVSwitch
model, tokenizer = _load_model(model_path, device, num_gpus, debug=False)
pipe = pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
torch_dtype=torch.bfloat16,
device_map="auto",
max_length=8000,
do_sample=True
)
llm = HuggingFacePipelineCompatible(pipeline=pipe)
return llm
HFPipelineMistral = get_llm_instance_wrapper(
llm_instance=get_mistral_7b_llm(), llm_type="hf_pipeline_mistral_topic"
)
register_llm_provider("hf_pipeline_mistral_topic", HFPipelineMistral)
#config = RailsConfig.from_path("./config")
config = RailsConfig.from_content(
yaml_content=yaml_content,
colang_content=colang_content
)
rails = LLMRails(config)
promptList = ['What is the weather like is New York next week?', 'How do I make sourdough?', 'What do you think of the president?', 'Would you consider yourself a Liberal?' ]
for p in promptList:
res = rails.generate(prompt=p)
print('PROMPT: ',p)
print('LLM: ',res)
Hi @sunotsue , I have been working with mistral (Mistral 7B Instruct) and NeMo and faced similar issues like the one you defined. I found out that perhaps the prompts have be tweaked a little bit more to have the desired outputs. For example this is my config.yaml
:
instructions:
- type: general
content: |
<your instructions>
sample_conversation: |
<your sample conversation>
<rails>
prompts:
- task: general
content: |-
[INST]
{{ general_instructions }}
{{ history | user_assistant_sequence }}
Assistant:
[/INST]
# Prompt for detecting the user message canonical form.
- task: generate_user_intent
content: |-
[INST]
{{ general_instructions }}
Your task is to generate a short summary called user intent for an user message in a conversation.
# This is how the user talks, use these examples to generate the user intent:
{{ examples | verbose_v1 }}
# This is the current conversation between the user and the bot, use these examples to generate the user intent:
{{ sample_conversation | first_turns(2) | verbose_v1 }}
{{ history | colang | verbose_v1 }}
[/INST]
output_parser: "verbose_v1"
# Prompt for generating the next steps.
- task: generate_next_steps
content: |-
[INST]
{{ general_instructions }}
# This is how the bot thinks, use these examples to generate the bot canonical form:
{{ examples | remove_text_messages | verbose_v1 }}
# This is the current conversation between the user and the bot:
{{ sample_conversation | first_turns(2) | remove_text_messages | verbose_v1 }}
{{ history | colang | remove_text_messages | verbose_v1 }}
[/INST]
output_parser: "verbose_v1"
# Prompt for generating the bot message from a canonical form.
- task: generate_bot_message
content: |-
[INST]
{{ general_instructions }}
{% if relevant_chunks %}
# This is some additional context:
```markdown
{{ relevant_chunks }}
```
{% endif %}
# This is how the bot talks, use these examples to generate the bot message:
{{ examples | verbose_v1 }}
# This is the current conversation between the user and the bot:
{{ sample_conversation | first_turns(2) | verbose_v1 }}
{{ history | colang | verbose_v1 }}
[/INST]
output_parser: "verbose_v1"
# Prompt for generating the value of a context variable.
- task: generate_value
content: |-
[INST]
{{ general_instructions }}
# This is how the bot thinks:
{{ examples | verbose_v1 }}
# This is the current conversation between the user and the bot:
{{ sample_conversation | first_turns(2) | verbose_v1 }}
{{ history | colang | verbose_v1 }}
# {{ instructions }}
${{ var_name }} =
[/INST]
output_parser: "verbose_v1"
Feel free to fill this file with your configuration and let me know if it works. I added a nice amount of colang syntax but I believe that yours will be more than enough. If that still does not work, I would recommend playing again with the prompts and the colang syntax. Still the guardrails system is not perfect but it is true that with your configuration, those questions you were trying should work fine.
Hope it helps.
Hi is it necessary to have a YAML file can't we use demilitters to guardrails the input and output of the model ?