[SQL] Run sqllogictests by rotation, some every day
This spreads the SLT tests over a number of days, using the current date to decide which tests to run. Currently the tests are spread over 1 year, requiring running about 15K tests/day. This is a trade-off between CI time and test coverage which we can tweak.
Fixes #2578
these are not non-deterministic, they are just a different set of tests every day.
I think the choice between never running them or running some of them every day is clear - we should run some of them. In fact, this is how they were run the first time: manually, restarting on each failure. This is much better.
So CI will pass on some days and on some it won't (if there is a failure in one of these tests). If you happen to not run any CI on e.g. a Sunday you may not detect a failure either. If you are not around for a few days CI we won't be able to merge anything until it can be fixed or this test is disabled or we wait 24h.
How is this a good solution?
There is another choice, we run them for every release, using a few VMs.
if the CI doesn't pass it's because we have a real bug, we better know about it, and we better fix it. and I haven't seen many days when the CI doesn't run. the alternative algorithm is called the "ostrich algorithm": pretend that the bug is not there. this issue was marked as "high priority"
the tests need 6 weeks of CPU time on a machine like the CI machines we don't have any tools to run tests like that
we don't have any tools to run tests like that
I'm positive there are tools to run something in parallel on n VMs. This way of testing makes no sense, what's the assurance you get from doing it like this. You will never know if your version feldera that you are running is any better or not doing it this way.
I filed a bug for a non-deterministic storage panic. Would you prefer me to delete the issue because the bug is non-deterministic? No one was looking at the panic message in my CI job when this happened. You don't really know whether it will happen in production or not.
So we'll never run fuzzing tests?
This is much better than fuzzing tests - these tests have validated outputs.
This is actually a quintessential use for gnu-parallel. We can bring up an auto-scaled VM group in a k8s cluster, then fire off 100s of jobs to run the tests, and report back failures. Doing it before a release or with some other cadence makes a lot of sense to me.
I think we agreed to merge this for now.
@gz you need to approve this, or this can't be merged