kelonio How to make performance test tolerant to the device where it is run?

How to make performance test tolerant to the device where it is run?

Open optimistex opened this issue 11 months ago • 1 comments

I did not find a way to make it tolerant and came up with my solution to measure the performance relatively to the performance of built-in JSON.parse.

Any other idea/recommendation?

const measurementBase = await benchmark.record(
  () => JSON.parse('{"config":[{"key":"email","value":"email"},{"key":"mqiPassword","value":"mqiPassword"}]}'),
  { iterations: 1000 }
);

const measurement = await benchmark.record(
  () => cloneAndSanitize(test),
  { iterations: 1000, minUnder: measurementBase.min * 50 }
);

expect(measurement.totalDuration).toBeLessThan(measurementBase.totalDuration * 20);

Mar 21 '24 08:03 optimistex

I think this would be a good feature to add :) I can think of some different ways to incorporate this into MeasureOptions, but I'm not sure which is the best trade-off. I'm open to others' thoughts on this.

Option 1: Compare directly to baseline function

This is similar to your example (test function is faster than 50x the baseline):

const options = {
  iterations: 1000,
  baseline: await measure(
    () => JSON.parse('{"config":[{"key":"email","value":"email"},{"key":"mqiPassword","value":"mqiPassword"}]}'),
    { iterations: 1000 },
  ),
};

const measurement = await benchmark.record(
  () => cloneAndSanitize(test),
  { ...options, baselineMultiplier: 50 },
);

Let's say that baseline has a mean of 10 ms and measurement has a mean of 200 ms. We'll verify that 200 < 10 * 50.

Pro: This is simple to implement.
Con: Thinking in terms of "this function is faster than 50x some other function" is not as straightforward as "this function is faster than X ms".
Con: Since the comparisons are implicit, you can't explicitly use meanUnder/etc, or at least I haven't thought of a nice way to make them work.

Option 2: Compare baseline function with itself on different systems

const options = {
  iterations: 1000,
  baseline: {
    reference: new Measurement([10, 11, 20, ...]), // snapshot from `measure` on some test system
    current: await measure(
      () => JSON.parse('{"config":[{"key":"email","value":"email"},{"key":"mqiPassword","value":"mqiPassword"}]}'),
      { iterations: 1000 },
  ),
};

const measurement = await benchmark.record(
  () => cloneAndSanitize(test),
  { ...options, meanUnder: 100 },
);

Let's say baseline.reference has a mean of 20 ms, baseline.current has a mean of 30 ms, and measurement has a mean of 140 ms. 20 / 30 = 0.66..., so we'll verify that 140 * 0.66... < 100.

Pro: You can write the expected times in terms of your main development system, which is more straightforward than measuring relative to some other function.
Con: Having to record/update the reference snapshot might be annoying.
Con: This works best when a single person is setting the thresholds based on their system. If multiple developers are setting thresholds based on their own systems, then the thresholds won't be consistent to the baseline. You'd probably want to run it without thresholds first on a standard system (e.g., GitHub workflow), then set the thresholds based on the those measurements.

Apr 12 '24 17:04 mtkennerly

kelonio kelonio copied to clipboard

How to make performance test tolerant to the device where it is run?

Option 1: Compare directly to baseline function

Option 2: Compare baseline function with itself on different systems

kelonio
kelonio copied to clipboard