empirical
empirical copied to clipboard
Test and evaluate LLMs and model configurations, across all the scenarios that matter for your application
Empirical
Empirical is the fastest way to test different LLMs and model configurations, across all the scenarios that matter for your application.
With Empirical, you can
- Run your test datasets locally against off-the-shelf or custom models
- Compare model outputs on a web UI, and test changes quickly
- Score your outputs with scoring functions
- Run tests on CI/CD
https://github.com/empirical-run/empirical/assets/284612/65d96ecc-12a2-474d-a81e-bbddb71106b6
Usage
Empirical bundles together a test runner and a web app. These can be used through the CLI in your terminal window.
Empirical relies on a configuration file, typically located at empiricalrc.js
which describes the test to run.
Start with a basic example
In this example, we will ask an LLM to extract entities from user messages and
give us a structured JSON output. For example, "I'm Alice from Maryland" will
become {name: 'Alice', location: 'Maryland'}
.
Our test will succeed if the model outputs valid JSON.
-
Use the CLI to create a sample configuration file called
empiricalrc.js
.npm init empiricalrun # For TypeScript npm init empiricalrun -- --using-ts
-
Run the example dataset against the selected models.
npx empiricalrun
This step requires the
OPENAI_API_KEY
environment variable to authenticate with OpenAI. This execution will cost $0.0026, based on the selected models. -
Use the
ui
command to open the reporter web app and see side-by-side results.npx empiricalrun ui
Make it yours
Edit the empiricalrc.js
file to make Empirical work for your use-case.
- Configure which models to use
- Configure your test dataset
- Configure scoring functions to grade output quality
Contribution guide
See development docs.