feat: notebook tester + CLI + report summary

Open andycandy opened this issue 3 months ago • 1 comments

Overview

This PR introduces automated testing of Jupyter notebooks within the repository. It adds a shell entrypoint (cookbook) and a Python test runner (test_nbclient.py) that:

Automatically discovers and executes notebooks.
Patches Colab-specific code for local execution.
Summarizes cell outputs and errors.
Optionally compares notebook outputs using Gemini AI for regression detection.
Generates detailed JSON reports for each notebook test run.

Features

Notebook Discovery: Automatically finds all .ipynb files in the repo, or runs specific notebooks.
Colab Compatibility: Patches google.colab.userdata.get calls to use environment variables for seamless local runs.
Cell Output Summarization: Captures and summarizes outputs, including errors, for each code cell.
Progress UI: Displays real-time progress and summary of test results in the terminal.
AI Output Comparison: When enabled, uses Gemini AI to classify output changes as ok_cells, slightly_changed, or wrong.
Reporting: Outputs results to reports/*.compare.json for further analysis.

Usage

Entrypoint

Use the cookbook script to run notebook tests:

# Run all notebooks
./cookbook test

# Run a specific notebook
./cookbook test examples/Book_illustration.ipynb

# Run multiple files
./cookbook test "quickstarts/Models.ipynb","quickstarts/Audio.ipynb"

# Run with AI output comparison
./cookbook test examples/Book_illustration.ipynb --ai-compare

# Set a custom timeout (seconds)
./cookbook test examples/Book_illustration.ipynb --timeout=1200

# Specify a kernel (default: python3)
./cookbook test examples/Book_illustration.ipynb --kernel=python3

Options

--ai-compare: Enables AI-based output comparison.
--timeout=<seconds>: Sets cell execution timeout (default: 900).
--kernel=<name>: Specifies Jupyter kernel (default: python3).
[notebook.ipynb]: Path to a specific notebook. If omitted, all notebooks are tested.

Output

Progress and summary are displayed in the terminal.
Detailed results are saved as JSON in the reports/ directory, e.g., reports/examples__Book_illustration.ipynb.compare.json.

Example Report

Each report includes:

File path
Duration
Status (passed/failed)
Buckets for cell output comparison (ok_cells, slightly_changed, wrong)
AI notes (if enabled)
Saved and test run outputs for

Notes

Ensure GOOGLE_API_KEY and GEMINI_API_KEY are set in your environment.

Aug 31 '25 08:08 andycandy

Work Still Needed

Verify that all required packages are installed using the pip install <package> format.
Remove leftover debug variables (e.g., raw_texts in _gemini_compare_batches and img_count in _summarize_outputs).
Ensure non-.ipynb files are rejected.
Convert the process into a weekly workflow, with reports automatically linked in the corresponding weekly issue.

Aug 31 '25 09:08 andycandy