procrastinate
procrastinate copied to clipboard
Fix (probably) incorrect test matrix.
The way the test matrix is currently setup doesn't test against all supported combinations of python versions and postgres versions, but rather just in sets of {py3.9, pg13}, {py3.10, pg14}, {py3.11, pg15}, etc, which is most likely not intentional. This change ensure CI/CD tests against the full range.
Successful PR Checklist:
- [ ] Tests
- [ ] (not applicable?)
- [ ] Documentation
- [ ] (not applicable?)
PR label(s):
- [ ] https://github.com/procrastinate-org/procrastinate/labels/PR%20type%3A%20breaking%20%F0%9F%92%A5
- [ ] https://github.com/procrastinate-org/procrastinate/labels/PR%20type%3A%20feature%20%E2%AD%90%EF%B8%8F
- [ ] https://github.com/procrastinate-org/procrastinate/labels/PR%20type%3A%20bugfix%20%F0%9F%95%B5%EF%B8%8F
- [ ] https://github.com/procrastinate-org/procrastinate/labels/PR%20type%3A%20miscellaneous%20%F0%9F%91%BE
- [ ] https://github.com/procrastinate-org/procrastinate/labels/PR%20type%3A%20dependencies%20%F0%9F%A4%96
- [ ] https://github.com/procrastinate-org/procrastinate/labels/PR%20type%3A%20documentation%20%F0%9F%93%9A
This is completely intentional, the idea is that all supported python versions and pg versions should be tested at least once, but unless we have specific reasons to believe a given combination will fail, we avoid the quadratic explosion of the test matrix for various reasons:
- It's extremely unlikely that one combination will fail. As much as I can remember, it's always been 1 or more rows or one or more columns of the matrix that fail.
- This would be 25 tests instead of 5, so 5 times more costly. Even if this seems free, the cost exists:
- Microsoft pays for this and the more we use it, the more incentive they will have to eventually rescind the free offer
- These are real machine using real energy and producing real CO2 on the planet somewhere
- GitHub limits to 20 concurrent jobs on free projects, so if we have 25 jobs, this will double the perceived CI time (as it will need 2 rounds).
- Some of the cells from the matrix may be unsupported and it will be equally painful to maintain the special cases
[!NOTE] it's the same with Django: we don't test all Django versions with all Python versions with all postgres version, otherwise we would have even more tests.
[!NOTE] The kind of issues we may have from mixing very old and very new versions of Python and PG are very likely to be issues from Django or Psycopg, and even if people encountered bugs, there's a lot of cases where we can only direct them to the appropriate tracker.
Given that, there are multiple possible strategies:
- test old versions of Python and PG and new versions of Python and PG together (test a diagonal of the matrix). Pro: most likely to get the best compatibility. Con: less likely to find deprecation issues due to support being suboptimal for extreme differences in versions. This is the one we went for.
- test newest versions of Python with oldest versions of PG and vice versa (an anti-diagonal of the matrix). Pro: more likely to find bugs. Con: Django 5.2 only supports PG 14+, so it will be harder to setup (and in general, we'll keep running against these kind of problems).
- make random pairings. Whether it's randomly generated pairing different at every run or statically random, I'm not a fan.
With all that said, I'm curious if you have opinions going the other way :)
One this that's true is that this could deserve a couple lines (and maybe a link to this issue) on the CONTRIBUTING doc :) or as a comment next to the matrix in the CI.
I understand your point, but at the same time the very first line of the readme states py3.9 and pg13+ as the supported matrix - is there actually any combination of this that isn't supported and needs special casing? It seems best to test these comprehensively before a release at least.
If you take a look at the Chancy workflow (https://github.com/TkTech/chancy/actions/runs/14032526511), I test only the most recent version on PRs, then the full, comprehensive grid when a Github release is created (as a draft, not published). Would you accept something similar?
I'm ok with discussing about how to improve our practice.
I'm not sure testing at release time is the best, as this can make releasing a bit more complex (in the event the test fails). On the other hand, as long as we test on the oldest python+oldest PG and newest python + newest PG on the PRs, I'm pretty confident that any other combination will fail extremely rarely, so there won't be a lot of cases where release is actually made harder.
That said, what should be tested ? I feel we should test any version of the software big enough to have official support policies, which should in most cases overlap pretty well with anything present on https://endoflife.date, in other words we should probably test Django too, which multiply tests by 3.
Testing a full grid at release time can be useful even if extra effort is not expended on increasing support. It provides a smidge more visibility and a chance to document known incompatibilities -- "FYI: if you're running version X of python, make sure to use at least version Y of postgres". While it may be rare that it comes up; it would mitigate user pain when hitting a versioning landmine. So my proposal would be to do the full test matrix before release, but not build up an expectation that discovered issues will be fixed.
aside: I know this discussion is a few months old, and I'm aware this comment is a weird first interaction to have with a project, so feel free to ignore me unless I contribute an actual PR first; I won't be offended. I just happen to be looking at transactional job queues, and thought a new POV might be useful for the tradeoffs discussed above 😄.
I know this discussion is a few months old, and I'm aware this comment is a weird first interaction to have with a project, so feel free to ignore me unless I contribute an actual PR first; I won't be offended.
Welcome to the project, we have pet peeves and get nerd sniped. You'll blend in just fine :)
Also, all contributions are equally worthwhile, code contributions are not inherently better than good points made, documentation, alerting on issues or anything. You being there already makes you a valued contributor :)
So my proposal would be to do the full test matrix before release
In the current procedure, there's no such thing as "before release". Releasing starts out by making a tag, and then making a PyPI release, so anytime we would do something release-specific, it's already after release, unless we do it on every commit to main. If we do that, either we'll often have a red cross saying the CI fails on the project main page, which is bad, or we'll set it as not failing the CI and we'll literally never know when something fails.
If we want to test the whole matrix, I think the best way is to do it on every PR, with the aforementioned drawbacks, or to completely change the release workflow works, but we'll need to find something convenient.
Unless I'm missing something?