stagehand
stagehand copied to clipboard
Control concurrency of evals
running 10+ evals on different browsers in parallel locally can become computationally challenging.
Braintrust offers a concurrency semaphor: https://www.braintrust.dev/docs/guides/evals/run
import { Factuality, Levenshtein } from "autoevals";
import { Eval } from "braintrust";
Eval("Say Hi Bot", {
data: () =>
Array.from({ length: 100 }, (_, i) => ({
input: `${i}`,
expected: `${i + 1}`,
})),
task: (input) => {
return input + 1;
},
scores: [Factuality, Levenshtein],
maxConcurrency: 5, // Run 5 tests concurrently
});