prompttools Robustness evaluation

Robustness evaluation

Open steventkrawczyk opened this issue 1 year ago • 1 comments

🚀 The feature

Request from potential user: "There are two main aspects, 1) adjusting prompts that changing semantic words does not trigger hallucination, 2) the prompt itself is such that LLM doesnt slip away from instruction"

Idea: for (1) use prompt templates to substitute words, run evals to check semantic similarity of all results. For (2) use auto-evaluation given instruction, prompt, and response to determine if the LLM followed instructions.

Motivation, pitch

We got this request from a potential user, and also robustness is a common concern in LLM evaluation

Alternatives

No response

Additional context

No response

Aug 10 '23 02:08 steventkrawczyk

Hey, I am working on the issue and I have generated 2 sample scripts - one involving prompt substitution and the other for auto evaluation. I am using Promptbench in both scripts. Can you please guide me as to how to integrate the scripts into your experiments? I am joining your discord group wherein we can discuss this issue in detail.

Dec 26 '23 05:12 RigvedRocks

prompttools prompttools copied to clipboard

Robustness evaluation

🚀 The feature

Motivation, pitch

Alternatives

Additional context

prompttools
prompttools copied to clipboard