easysync tests: remove randomness & analyze performance metrics

Open webzwo0i opened this issue 4 years ago • 1 comments

The easysync tests contain some randomness. This introduces some non-determinism so that from time to time total code coverage differs between multiple runs.

Should I compile a set of inputs and hard code them so that every time the suite runs the exact same input is used?

Another improvement: Should I try to find a way to get performance metrics out of our workflows? I already described the approach in https://github.com/ether/etherpad-lite/issues/4341#issuecomment-698098454 (I have the feeling we discussed this in another issue already, but I didn't find it) https://github.com/ether/etherpad-lite/pull/5233 could have some performance impact. On browsers it probably doesn't matter that much. My plan is to calculate some millions of inputs from the existing test suite maybe with slightly adjusted boundaries.

run easysync suite as backend test (whole suite, not individual tests)
record metrics for all single tests
export results to Github page

To get started, we can run the suite 1000 times.

configure a cronjob to repeat this maybe every 4 hours for 1 or 2 days

Now we have a baseline for every test.

add a workflow, that checks out the baseline commit, runs easysync suite maybe 10 times and calculates the overall (relative) deviation (ie sum of all test durations compared to baseline sum)
check out commit that should be tested and run the suite multiple times, calculate metrics "normalized" with deviation from above and upload the result
I wouldn't do any fail/success until we know that it works reliably, but consider all of this informative

Nov 09 '21 01:11 webzwo0i

Should I compile a set of inputs and hard code them so that every time the suite runs the exact same input is used?

Yes. At the very least we should use a RNG with a fixed seed so that the results are reproducible.

Another improvement: Should I try to find a way to get performance metrics out of our workflows? I already described the approach in #4341 (comment)

Automated performance regression testing is really difficult to do properly, and requires lots of maintenance. Until we are bitten by performance regressions, I think our time is best spent elsewhere.

(I have the feeling we discussed this in another issue already, but I didn't find it)

Maybe #4988?

#5233 could have some performance impact. On browsers it probably doesn't matter that much. My plan is to calculate some millions of inputs from the existing test suite maybe with slightly adjusted boundaries.

run easysync suite as backend test (whole suite, not individual tests)

record metrics for all single tests

export results to Github page

To get started, we can run the suite 1000 times.

configure a cronjob to repeat this maybe every 4 hours for 1 or 2 days

Now we have a baseline for every test.

add a workflow, that checks out the baseline commit, runs easysync suite maybe 10 times and calculates the overall (relative) deviation (ie sum of all test durations compared to baseline sum)

check out commit that should be tested and run the suite multiple times, calculate metrics "normalized" with deviation from above and upload the result

I wouldn't do any fail/success until we know that it works reliably, but consider all of this informative

I would love to see that done, but it is quite a bit of work. My plan was to just release the changes and see if anyone complains about a drop in performance. :slightly_smiling_face:

Nov 10 '21 06:11 rhansen