feldera icon indicating copy to clipboard operation
feldera copied to clipboard

[SQL] Run sqllogictests by rotation, some every day

Open mihaibudiu opened this issue 1 year ago • 12 comments

This spreads the SLT tests over a number of days, using the current date to decide which tests to run. Currently the tests are spread over 1 year, requiring running about 15K tests/day. This is a trade-off between CI time and test coverage which we can tweak.

mihaibudiu avatar Sep 26 '24 23:09 mihaibudiu

Fixes #2578

mihaibudiu avatar Sep 26 '24 23:09 mihaibudiu

these are not non-deterministic, they are just a different set of tests every day.

mihaibudiu avatar Sep 26 '24 23:09 mihaibudiu

I think the choice between never running them or running some of them every day is clear - we should run some of them. In fact, this is how they were run the first time: manually, restarting on each failure. This is much better.

mihaibudiu avatar Sep 26 '24 23:09 mihaibudiu

So CI will pass on some days and on some it won't (if there is a failure in one of these tests). If you happen to not run any CI on e.g. a Sunday you may not detect a failure either. If you are not around for a few days CI we won't be able to merge anything until it can be fixed or this test is disabled or we wait 24h.

How is this a good solution?

gz avatar Sep 26 '24 23:09 gz

There is another choice, we run them for every release, using a few VMs.

gz avatar Sep 26 '24 23:09 gz

if the CI doesn't pass it's because we have a real bug, we better know about it, and we better fix it. and I haven't seen many days when the CI doesn't run. the alternative algorithm is called the "ostrich algorithm": pretend that the bug is not there. this issue was marked as "high priority"

mihaibudiu avatar Sep 26 '24 23:09 mihaibudiu

the tests need 6 weeks of CPU time on a machine like the CI machines we don't have any tools to run tests like that

mihaibudiu avatar Sep 26 '24 23:09 mihaibudiu

we don't have any tools to run tests like that

I'm positive there are tools to run something in parallel on n VMs. This way of testing makes no sense, what's the assurance you get from doing it like this. You will never know if your version feldera that you are running is any better or not doing it this way.

gz avatar Sep 26 '24 23:09 gz

I filed a bug for a non-deterministic storage panic. Would you prefer me to delete the issue because the bug is non-deterministic? No one was looking at the panic message in my CI job when this happened. You don't really know whether it will happen in production or not.

mihaibudiu avatar Sep 26 '24 23:09 mihaibudiu

So we'll never run fuzzing tests?

mihaibudiu avatar Sep 26 '24 23:09 mihaibudiu

This is much better than fuzzing tests - these tests have validated outputs.

mihaibudiu avatar Sep 26 '24 23:09 mihaibudiu

This is actually a quintessential use for gnu-parallel. We can bring up an auto-scaled VM group in a k8s cluster, then fire off 100s of jobs to run the tests, and report back failures. Doing it before a release or with some other cadence makes a lot of sense to me.

lalithsuresh avatar Sep 26 '24 23:09 lalithsuresh

I think we agreed to merge this for now.

mihaibudiu avatar Oct 22 '24 20:10 mihaibudiu

@gz you need to approve this, or this can't be merged

mihaibudiu avatar Oct 22 '24 23:10 mihaibudiu