kelonio
kelonio copied to clipboard
How to make performance test tolerant to the device where it is run?
I did not find a way to make it tolerant and came up with my solution to measure the performance relatively to the performance of built-in JSON.parse
.
Any other idea/recommendation?
const measurementBase = await benchmark.record(
() => JSON.parse('{"config":[{"key":"email","value":"email"},{"key":"mqiPassword","value":"mqiPassword"}]}'),
{ iterations: 1000 }
);
const measurement = await benchmark.record(
() => cloneAndSanitize(test),
{ iterations: 1000, minUnder: measurementBase.min * 50 }
);
expect(measurement.totalDuration).toBeLessThan(measurementBase.totalDuration * 20);
I think this would be a good feature to add :) I can think of some different ways to incorporate this into MeasureOptions
, but I'm not sure which is the best trade-off. I'm open to others' thoughts on this.
Option 1: Compare directly to baseline function
This is similar to your example (test function is faster than 50x the baseline):
const options = {
iterations: 1000,
baseline: await measure(
() => JSON.parse('{"config":[{"key":"email","value":"email"},{"key":"mqiPassword","value":"mqiPassword"}]}'),
{ iterations: 1000 },
),
};
const measurement = await benchmark.record(
() => cloneAndSanitize(test),
{ ...options, baselineMultiplier: 50 },
);
Let's say that baseline
has a mean of 10 ms and measurement
has a mean of 200 ms. We'll verify that 200 < 10 * 50.
- Pro: This is simple to implement.
- Con: Thinking in terms of "this function is faster than 50x some other function" is not as straightforward as "this function is faster than X ms".
- Con: Since the comparisons are implicit, you can't explicitly use
meanUnder
/etc, or at least I haven't thought of a nice way to make them work.
Option 2: Compare baseline function with itself on different systems
const options = {
iterations: 1000,
baseline: {
reference: new Measurement([10, 11, 20, ...]), // snapshot from `measure` on some test system
current: await measure(
() => JSON.parse('{"config":[{"key":"email","value":"email"},{"key":"mqiPassword","value":"mqiPassword"}]}'),
{ iterations: 1000 },
),
};
const measurement = await benchmark.record(
() => cloneAndSanitize(test),
{ ...options, meanUnder: 100 },
);
Let's say baseline.reference
has a mean of 20 ms, baseline.current
has a mean of 30 ms, and measurement
has a mean of 140 ms. 20 / 30 = 0.66..., so we'll verify that 140 * 0.66... < 100.
- Pro: You can write the expected times in terms of your main development system, which is more straightforward than measuring relative to some other function.
- Con: Having to record/update the reference snapshot might be annoying.
- Con: This works best when a single person is setting the thresholds based on their system. If multiple developers are setting thresholds based on their own systems, then the thresholds won't be consistent to the baseline. You'd probably want to run it without thresholds first on a standard system (e.g., GitHub workflow), then set the thresholds based on the those measurements.