uptrain icon indicating copy to clipboard operation
uptrain copied to clipboard

Add better few shot examples for response consistency eval

Open sourabhagr opened this issue 4 months ago • 1 comments

The few-shot example should include:

  1. An argument to justify why the given answer is appropriate for the given question.
  2. A score between 0 to 1, indicating how logical the argument
  3. An explanation for the score.

Relevant code snippets at "uptrain/operators/language/prompts/few_shots.py" with variable name: RESPONSE_CONSISTENCY_FEW_SHOT__COT

sourabhagr avatar Feb 18 '24 21:02 sourabhagr

Something like this (I generated this using chatgpt)? @sourabhagr

RESPONSE_CONSISTENCY_FEW_SHOT__COT = """
[Question]: Which Alex is being referred to in the last line?
[Context]: In a story, Alex is a renowned chef famous for their culinary skills, especially in Italian cuisine. They've recently been experimenting with French recipes, trying to fuse them with Italian dishes to create something unique. Alex's restaurant, which used to serve exclusively Italian dishes, now offers a hybrid menu that's gaining popularity. However, Alex has a twin named Alex, who is not involved in the culinary world but is an artist in the local community. The artist Alex's paintings are not good. But, her food is also delicious and is tasty.
[Response]: In the last line, it is referring to the renowned chef Alex, whose food is delicious and tasty.
[Argument]: The LLM's response identifies the renowned chef Alex as the subject of the last line, focusing on the established narrative that this Alex is known for their culinary expertise. This interpretation maintains consistency with the broader story arc, where chef Alex's skills and experimentation with cuisine are central themes.
[Score]: 0.8
[Explanation]: The response correctly identifies the renowned chef Alex as the subject of the last line based on the established narrative about culinary skills. However, it overlooks the possibility of the last line introducing a twist regarding the artist Alex's cooking abilities. The score of 0.8 reflects the response's strong alignment with the main storyline but acknowledges a slight deviation from addressing the potential new aspect introduced in the last line.
"""

sky-2002 avatar Apr 11 '24 13:04 sky-2002