feat(eval): parallelize inference and evaluator execution

Open matjanos opened this issue 1 month ago • 1 comments

Currently, genkit's evaluation system runs inference sequentially for all test cases in bulkRunAction(). For large datasets (e.g., 150+ test cases), this causes extremely slow evaluation times as each flow/model execution must complete before the next one starts.

Parallelized inference using the existing batchSize to run samples concurrently (capped at 100) while preserving ordering, per-sample error capture, and progress logging.
Evaluator actions now execute in parallel to match inference concurrency.
eval:flow continues to use --batchSize to control concurrency; eval:run behavior is unchanged. Example: genkit eval:flow myFlow data.json --batchSize 5 now runs both inference and evaluation in parallel batches.

Checklist (if applicable):

[x] PR title is following https://www.conventionalcommits.org/en/v1.0.0/
[x] Tested (manually, unit tested, etc.)
[x] Docs updated (updated docs or a docs bug required)

Nov 27 '25 16:11 matjanos

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

Nov 27 '25 16:11 google-cla[bot]