genkit [CLI] Option to batch run on evaluation samples

Is your feature request related to a problem? Please describe. Current evals implementation is serial, and takes up a long time if using a complex evaluator / large dataset

Describe the solution you'd like Add a batch option on the CLI to batch multiple samples together when running evaluation (not necessarily inference, since this may interfere with the trace collection process). This has already been implemented as POC in https://github.com/firebase/genkit/commit/5bded06a4885fc136ce43738b5779059299ac423

Additional Requirements:

have better span names,
link a metric to the corresponding span in the evaluator trace.

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Additional context https://github.com/firebase/genkit/commit/5bded06a4885fc136ce43738b5779059299ac423

Jan 22 '25 16:01 ssbushi

ANy update on this? We also have a dataset where each item runs for 8 mins, and it would help significantly to have batchsize support to make concurrent evals.

May 22 '25 07:05 anilgulecha

Hi @anilgulecha

I've just merged support for batching on Genkit tooling side (only for JS right now). You can leverage this by using the eval:flow or eval:run commands with the --batchSize=<number> flag once we have the next release.

I'm also adding support in the Dev UI and that should be out very soon too!

Thanks!

May 29 '25 15:05 ssbushi