llm-answer-engine
llm-answer-engine copied to clipboard
Proposal: report-only CI report for recorded answer (fixtures-only, PromptProof)
I’d like to add a tiny, report-only CI check that gives reviewers a one-glance HTML report showing whether a recorded answer still matches the expected shape (and basic guardrails) — without any live model calls and without blocking merges.
Why this helps here
- Keeps example responses from silently drifting during refactors
- Makes drive-by PRs safer (reviewers see schema/PII checks & cost summary in one artifact)
- Deterministic by default (seed + runs=3) to avoid flakes; no secrets required
Files to add
-
.github/workflows/promptproof.yml
name: PromptProof
on:
pull_request:
paths:
- ".github/workflows/promptproof.yml"
- "promptproof.yaml"
- "fixtures/promptproof/**"
jobs:
proof:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: geminimir/promptproof-action@v0
with:
config: promptproof.yaml
runs: 3
seed: 1337
max-run-cost: 0.75
report-artifact: promptproof-report
mode: report-only
-
promptproof.yaml
mode: fail
format: html
fixtures:
- path: fixtures/promptproof/answer_engine.json
checks:
- id: answer_schema
type: schema
json_schema:
type: object
properties:
output:
type: object
properties:
answer: { type: string, minLength: 1 }
citations: { type: array, items: { type: string }, nullable: true }
latency_ms: { type: number, minimum: 0 }
required: [answer]
required: [output]
- id: forbid_emails
type: regex_forbid
pattern: "[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,}"
budgets:
max_run_cost: 0.75
stability:
runs: 3
seed: 1337
-
fixtures/promptproof/answer_engine.json
{
"record_id": "ae-hello-001",
"input": { "question": "What is PromptProof?" },
"output": {
"answer": "A CI gate for LLM outputs.",
"citations": ["https://example.com/doc"],
"latency_ms": 42
}
}
What maintainers get
- A single HTML report artifact per PR (schema/regex/cost summary).
- Zero live calls; easy to delete if unwanted.
References Sample report: https://geminimir.github.io/promptproof-action/reports/before.html
If this sounds okay, I’ll open a 3-file PR and can tweak the checks/paths to your preference.
Marketplace: https://github.com/marketplace/actions/promptproof-eval Demo project: https://github.com/geminimir/promptproof-demo-project