promptfoo
promptfoo copied to clipboard
Test your prompts, agents, and RAGs. Red teaming, pentesting, and vulnerability scanning for LLMs. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command...
**Is your feature request related to a problem? Please describe.** Some of the common math-based evaluation metrics for NLP/LLM includes ROUGE (already supported), BLEU, METEOR, GLEU and some others. See...
**Is your feature request related to a problem? Please describe.** No **Describe the solution you'd like** I'd like an option that I can see the output texts on eval web...
**Is your feature request related to a problem? Please describe.** When using providers, we need to set env variables to be exported in our env to make them work. In...
**Describe the bug** When using the followin sql assert something fails. **To Reproduce** 1. Add the following assertion to your test suit: ] ``` - description: "Specific SQL Assertion" vars:...
**Is your feature request related to a problem? Please describe.** I am abusing Promptfoo to perform benchmarking. If the model FAILs, I most often don't want to retry the prompt....
Hello, I deployed promptfoo on a VM using a Docker-compose and the pre-built image available at ghcr.io/promptfoo/promptfoo:main   However, if I run several evaluations and then explore...
**Is your feature request related to a problem? Please describe.** I am testing a large number of test cases using a specified python metric set. In the case of some...
**Describe the bug** The `context-recall` [prompt](https://github.com/promptfoo/promptfoo/blob/b08099b2c6a7bb32866d63ad9cde7d79f37423ae/src/prompts/external/ragas.ts#L11-L25) is expected by the [matchesContextRecall](https://github.com/promptfoo/promptfoo/blob/b08099b2c6a7bb32866d63ad9cde7d79f37423ae/src/matchers.ts#L632-L680) to produce a list with of *single* sentence/line statements followed by an [Attributed|NotAttributed] marker, like it is illustrated in...
Add the ability to permalink directly to a result row in the eval view
**Describe the bug** Setting `--grader` on command line causes test failure. **To Reproduce** 1. promptfooconfig.yaml: ``` description: 'minimal repo' prompts: - 'test prompt {{input}}' providers: - "openai:chat:gpt-4o" defaultTest: assert: -...