Example is underwhelming
Describe the issue as clearly as possible:
I used README.md's example, and changed the review to "Review: This restaurant is bad!" -instead of 'good'- or any similar negative phrasing. The answer given back is almost exclusively "Positive".
I understand this is the model's behaviour, but maybe either:
- this model isn't up to this task
- the prompt is too hard for this example.
Suggestion: We could reconsider a more illustrative example
Steps/code to reproduce the bug:
import outlines
model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")
prompt = """You are a sentiment-labelling assistant.
Is the following review positive or negative?
Review: This restaurant is bad!
"""
generator = outlines.generate.choice(model, ["Positive", "Negative"])
answer = generator(prompt)
print(f'{answer=}')
Expected result:
"Negative"
Error message:
"Positive"
Outlines/Python version information:
outlines v 0.1.11 Python 3.10.14 | packaged by conda-forge | (main, Mar 20 2024, 12:51:49) [Clang 16.0.6 ]
Context for the issue:
first time user trying out the example
what happens without outlines?
This might be a result of not using the proper chat templating in the example. This works better:
import outlines
from transformers import AutoTokenizer
model_name = "microsoft/Phi-3-mini-4k-instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = outlines.models.transformers(model_name)
def template(prompt: str) -> str:
return tokenizer.apply_chat_template(
[{"role": "user", "content": prompt}],
tokenize=False,
add_bos=True,
add_generation_prompt=True,
)
prompt = """You are a sentiment-labelling assistant.
Is the following review positive or negative?
Review: This restaurant is bad!
"""
generator = outlines.generate.choice(model, ["Positive", "Negative"])
answer = generator(template(prompt))
print(f'{answer=}')
This is a general issue with the documentation and in outlines use more generally. We have a few outstanding issues for this, like #987, #756,
There's a PR #1019 which might address this, though currently it seems to be in limbo.