promptest
promptest copied to clipboard
CLI tool to test prompt effectiveness easily against AI models
promptest
promptest
is a command-line tool to test prompts with language models (LLMs).
Installation
To install promptest
, run the following command:
pip install promptest
Usage
Make the OpenAI API key available as an environment variable:
export OPENAI_API_KEY=sk-...
To use promptest
, run the following command:
promptest --prompt /path/to/prompt.yaml --inputs /path/to/inputs.yaml
Here, /path/to/prompt.yaml
should be the path to a YAML file containing the template for your LLM, and /path/to/inputs.yaml
should be the path to a YAML file containing the inputs for your tests.
For more information on the command line arguments that promptest
supports, run:
promptest --help
Example
Here are a few examples of how to use promptest
to test LLM prompts.
Test output with exact match
-
Create a YAML file
prompt.yaml
containing the template for your LLM:template: | Provide the longest word in terms of characters in an input piece of text. Only provide the word, no other text. Provide the word exactly as written in the text, do not modify it, capitalize it, etc. The text is: {text} input_variables: - text output_key: longest_word
-
Create a YAML file
tests.yaml
containing the inputs for your tests:model_names: - text-davinci-003 temperature: 0 max_tokens: 30 tests: - variables: text: "the longest word in this text is longest" expected_output: type: "string" value: "longest" - variables: text: "the longest word in this text is confusing" expected_output: type: "string" value: "confusing"
Test output with another prompt
- Create a
prompt.yml
file containing the template for your prompt:
template: |
Generate a short story in the {genre} genre. It should be at most 150 words long.
input_variables:
- genre
output_key: story
- Create a tests.yaml file containing the inputs for your tests and the conditions the output should meet:
model_names: ['text-davinci-003']
temperature: 0.9
max_tokens: 500
tests:
- variables:
genre: "sci-fi"
expected_output:
type: prompt
model: 'text-davinci-003'
value:
conditions:
- "contains at least one futuristic element"
- "story has a clear beginning, middle, and end"
- variables:
genre: "mystery"
expected_output:
type: prompt
model: 'text-davinci-003'
value:
conditions:
- "contains a mystery or puzzle to be solved"
- "story has a clear beginning, middle, and end"
In this example, the tests uses another prompt to check if the generated story contains specific genre elements and has a clear structure with a beginning, middle, and end.
Running the tests
Run the following command to test the prompt templates:
promptest --prompt prompt.yaml --tests tests.yaml
Supported models
The following models are supported:
- OpenAI: "gpt-4", "gpt-4-0314", "gpt-4-32k", "gpt-4-32k-0314", "gpt-3.5-turbo", "gpt-3.5-turbo-0301", "text-davinci-003", "text-davinci-002", "text-curie-001", "text-babbage-001", "text-ada-001"
- gpt4all: specify the path to the model as a string, e.g. "./model/gpt4all.bin"
Output
The tests output detailed results for each model, including the number of test cases passed and the pass ratio.
When you run promptest
, the tool will execute tests for each LLM model specified in the tests.yaml
file. For each model, promptest
will run the tests and output the results to the console. In addition to the console output, promptest
will also create a file containing the detailed test results.
The file will be named model_name_test_result.yml, where model_name is the name of the LLM model being tested (e.g., text-davinci-002, text-ada-001, etc.). The file will be written to a directory named prompt_test_results in the current working directory. The directory will be named with a timestamp and the name of the template file being used, in the following format:
prompt_test_results/YYYY_MM_DD_HH_MM_template_name/
The YAML file will contain the following information:
max_tokens: 30
model: text-davinci-003
prompt_template: |
Provide the longest word in terms of characters in an input piece of text.
Only provide the word, no other text.
Provide the word exactly as written in the text, do not modify it, capitalize it, etc.
The text is:
{text}
results:
- comparison_result: true
inputs: {text: the longest word in this text is longest}
output: longest
- comparison_result: true
inputs: {text: the longest word in this text is confusing}
output: confusing
temperature: 0
totals: {passes: 2, total: 2}
License
promptest
is released under the MIT License.