Dataset generation not working for custom python provider
I have a SQL Agent (as a provider) built with Langchain and Azure OpenAI models. When I run the dataset generation feature, I get this error:
Why am I receiving this error regarding OpenAI API keys when I am already using Azure OpenAI keys?
Steps to reproduce the behavior:
- Run:
npx promptfoo generate dataset -c promptfooconfig.yaml -o tests.yaml
- Promptfoo config file:
description: "SQL Agent Eval"
prompts:
- id: /path/to/file/prompts.py
label: prompt_instructions
providers:
- id: python:provider.py
label: SQL Agent RAG
tests:
- vars:
input:
- Add_input_message_1.
- Add_input_message_2.
assert:
- type: latency
threshold: 180000
- Promptfoo version: 0.91.3
Hi @oanastrut, try running:
npx promptfoo generate dataset -c promptfooconfig.yaml -o tests.yaml --provider file://provider.py
We use the grading provider (not the providers listed in promptfooconfig.yaml) to generate datasets. You can check where the provider is loaded here.
You can also run:
npx promptfoo generate dataset --help
to see other available options for dataset generation.
We'd love to hear any suggestions for product improvements around dataset generation. I'll update the docs to clarify this. Thanks again for reporting the issue! Let me know if this works!
Unfortunately, it doesn't work for me. If I run it like this:
npx promptfoo generate dataset -c promptfooconfig.yaml -o tests.yaml --provider file://provider.py
I receive this error:
I have tried all kind of variations. For e.g., if I use directly the Azure OpenAI GPT4-o model and leave it defined only in the config file:
providers:
- id: azureopenai:chat:deployment_name
config:
apiHost: 'xxxxxxx.openai.azure.com/'
apiKey: xxxxxxxxxxx
I get this error:
And if I pass the provider like this:
npx promptfoo generate dataset -c promptfooconfig.yaml -o tests.yaml --provider azureopenai:chat:deployment_name
I get this error:
@oanastrut feel free to grab a time on my calendar and I'll look at this with you: cal.com/michael-dangelo/15min.
Did this get resolved? I'm running into a similar issue when calling synthesizeFromTestSuite() using the js package
I was able to find a workaround:
const originalEnv = { ...process.env }
try {
Object.assign(process.env, {
AZURE_API_KEY: provider.config.apiKey,
AZURE_API_HOST: provider.config.apiHost,
})
...call synthesizeFromTestSuite()
} catch (error) {
...handle error
} finally {
process.env = originalEnv;
}
Thanks for reporting this and for the follow-up from @bennibbelink.
The --provider flag specifies the LLM used to generate test cases - it's separate from the providers in your config file (which are the targets being tested). By default, dataset generation looks for OPENAI_API_KEY in your environment.
For Azure OpenAI, you have two options:
Option 1: Environment variables
export AZURE_OPENAI_API_KEY=your-key
export AZURE_API_HOST=your-host.openai.azure.com
export AZURE_OPENAI_DEPLOYMENT_NAME=your-deployment
promptfoo generate dataset -c promptfooconfig.yaml
Option 2: Provider config file
Create synthesis-provider.yaml:
id: azureopenai:chat:your-deployment-name
config:
apiHost: your-host.openai.azure.com
apiKey: ${AZURE_OPENAI_API_KEY}
Then run:
promptfoo generate dataset --provider file://synthesis-provider.yaml
Option 3: Python provider
You can also use a custom Python provider for full control:
promptfoo generate dataset --provider file://synthesis-provider.py
I've updated the documentation in #6673 to clarify this. Closing as the feature works as designed, but the docs were lacking. Thanks again for the report!