Proposal: report-only CI report for recorded answer (fixtures-only, PromptProof)

Open geminimir opened this issue 6 months ago • 0 comments

I’d like to add a tiny, report-only CI check that gives reviewers a one-glance HTML report showing whether a recorded answer still matches the expected shape (and basic guardrails) — without any live model calls and without blocking merges.

Why this helps here

Keeps example responses from silently drifting during refactors
Makes drive-by PRs safer (reviewers see schema/PII checks & cost summary in one artifact)
Deterministic by default (seed + runs=3) to avoid flakes; no secrets required

Files to add

.github/workflows/promptproof.yml

name: PromptProof
on:
  pull_request:
    paths:
      - ".github/workflows/promptproof.yml"
      - "promptproof.yaml"
      - "fixtures/promptproof/**"
jobs:
  proof:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: geminimir/promptproof-action@v0
        with:
          config: promptproof.yaml
          runs: 3
          seed: 1337
          max-run-cost: 0.75
          report-artifact: promptproof-report
          mode: report-only

promptproof.yaml

mode: fail
format: html
fixtures:
  - path: fixtures/promptproof/answer_engine.json
checks:
  - id: answer_schema
    type: schema
    json_schema:
      type: object
      properties:
        output:
          type: object
          properties:
            answer: { type: string, minLength: 1 }
            citations: { type: array, items: { type: string }, nullable: true }
            latency_ms: { type: number, minimum: 0 }
          required: [answer]
      required: [output]
  - id: forbid_emails
    type: regex_forbid
    pattern: "[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,}"
budgets:
  max_run_cost: 0.75
stability:
  runs: 3
  seed: 1337

fixtures/promptproof/answer_engine.json

{
  "record_id": "ae-hello-001",
  "input": { "question": "What is PromptProof?" },
  "output": {
    "answer": "A CI gate for LLM outputs.",
    "citations": ["https://example.com/doc"],
    "latency_ms": 42
  }
}

What maintainers get

A single HTML report artifact per PR (schema/regex/cost summary).
Zero live calls; easy to delete if unwanted.

References Sample report: https://geminimir.github.io/promptproof-action/reports/before.html

If this sounds okay, I’ll open a 3-file PR and can tweak the checks/paths to your preference.

Marketplace: https://github.com/marketplace/actions/promptproof-eval Demo project: https://github.com/geminimir/promptproof-demo-project

Aug 15 '25 03:08 geminimir