Proposal: non-blocking CI report for recorded RAG chat output (PromptProof)

Open geminimir opened this issue 6 months ago • 0 comments

I’d like to add a tiny, report-only CI check that produces a one-glance HTML report showing whether a recorded RAG chat response still matches the expected shape, without any live model calls and without blocking merges.

Why this helps

Prevents silent drift in the component’s output contract (e.g., { message: string, sources?: string[] }) when making changes or refactoring.
Gives maintainers & contributors a clear, visual artifact on each PR: schema validation, basic safety checks, and cost/latency summary in one place.
Acts as lightweight regression testing without requiring a dedicated backend or API keys.
Non-blocking & deterministic — uses a fixed seed and 3 runs to avoid flaky results.
Keeps CI scoped to minimal new files, so it’s safe for a library repo without triggering unrelated builds.

Files to add

.github/workflows/promptproof.yml

name: PromptProof
on:
  pull_request:
    paths:
      - ".github/workflows/promptproof.yml"
      - "promptproof.yaml"
      - "fixtures/promptproof/**"
jobs:
  proof:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: geminimir/promptproof-action@v0
        with:
          config: promptproof.yaml
          runs: 3
          seed: 1337
          max-run-cost: 0.50
          report-artifact: promptproof-report
          mode: report-only

promptproof.yaml

mode: fail
format: html
fixtures:
  - path: fixtures/promptproof/rag_chat.json
checks:
  - id: rag_message_schema
    type: schema
    json_schema:
      type: object
      properties:
        output:
          type: object
          properties:
            message: { type: string, minLength: 1 }
            sources: { type: array, items: { type: string }, nullable: true }
          required: [message]
      required: [output]
budgets:
  max_run_cost: 0.50
stability:
  runs: 3
  seed: 1337

fixtures/promptproof/rag_chat.json

{
  "record_id": "upstash-rag-001",
  "input": { "query": "What is vector search?" },
  "output": { "message": "Sample deterministic blurb.", "sources": ["https://example.com/source"] }
}

What maintainers get

A single HTML report artifact per PR (schema/regex/cost summary).
Zero live calls; easy to delete if unwanted.

References Sample report: https://geminimir.github.io/promptproof-action/reports/before.html

If this sounds okay, I’ll open a 3-file PR and can tweak the checks/paths to your preference.

Marketplace: https://github.com/marketplace/actions/promptproof-eval Demo project: https://github.com/geminimir/promptproof-demo-project

Aug 15 '25 03:08 geminimir