pytest-split
pytest-split copied to clipboard
Get rid of test front-loading in the duration_based_chunks algo
Should hopefully resolve #26
Nope, still fails. And if I change the test durations in that test case to all 1 except the last one which gets 100, all of the tests end up in the first group and I get 6 empty groups.
How would you suggest changing the implementation @alexforencich? I'm not seeing the issue 🙂
So I did not run the test correctly (have to actually install the package before you can run pytest). After fixing that problem, the test case in the PR passes. However, it fails if I change it to:
durations = {}
durations["a"] = 1
durations["b"] = 1
durations["c"] = 1
durations["d"] = 1
durations["e"] = 1
durations["f"] = 1
durations["g"] = 1
durations["h"] = 100
In this case, only the first two groups are utilized and the rest are empty.
It seems to me like the goal of this algorithm is to preserve the test ordering. So perhaps a better method would be to compute the cumulative sum into a new array, and then partition that on total_duration*index/slices. Then adjust the indicies so they each correspond to at least one test.
Incidentally, I think you can get slightly better performance out of least_duration by sorting the tests by duration first so they are processed from longest to shortest.
However, the best option might be to do something like this: assign each test an ID number based on the original order, then sort based on duration, use the algorithm in least_duration to group the tests, then sort each group based on the ID. This will preserve the test order, but produce more optimal grouping, and there will be no issue with empty groups unless there are more groups than tests (and in that case, what you can do is enable some random test in all of the otherwise-empty groups).
So I did not run the test correctly (have to actually install the package before you can run pytest). After fixing that problem, the test case in the PR passes. However, it fails if I change it to:
durations = {} durations["a"] = 1 durations["b"] = 1 durations["c"] = 1 durations["d"] = 1 durations["e"] = 1 durations["f"] = 1 durations["g"] = 1 durations["h"] = 100In this case, only the first two groups are utilized and the rest are empty.
That almost seems like a feature, not a bug; but more importantly, is that something that was changed in #14? We discussed fixing the change in behavior and the possibility of guaranteeing that every group has a test in #26, but to be clear - this PR just tries to fix the first point.
Would the test above behave differently in previous versions?
Well, if it wasn't causing github actions to crash, I wouldn't care as much. But it is, so it's a bug. Looks like yes, the splitting algorithm was rewritten in #14. Looks like the previous version maybe was maybe duplicating tests in this case (not explicitly, but due to a side-effect of how it was written) so there weren't any empty groups, but this wasn't noticed since it didn't cause pytest to exit prematurely.
Incidentally, I think you can get slightly better performance out of least_duration by sorting the tests by duration first so they are processed from longest to shortest.
@alexforencich you're right, that would be better. I just stuck with the current solution because that preserved relative order, which I thought was important for some. I'm happy to add this at a later stage.
Out of curiosity, why do people care about ordering? I really only want my test groups to be as even as possible, because the longest group sets the total time of my pipeline.
Just imagine you work for a company that has a large suite of tests and the corresponding technical debt. The tests already have order dependency (by accident) but it's too big a problem to fix all at once.
Yep lack of test isolation is the only thing I can think of 🙂
Well, you still might have ordering dependency issues as you're not going to run all of the tests.
For relative order, you can restore that by assigning monotonically increasing IDs before doing the partitioning, then sort each group by this ID before running the tests.
Updated the PR now @alexforencich @jerry-git
The duration_based_chunks algo should now have the following properties:
- Guarantees that every group is assigned a test (as long as
test count>splits) - Maintains order
Definitely looks a lot better. Still greedy I think, but not sure if there is a nice way of improving that without making it a lot more complicated. At any rate, if you want the most even split, least_duration is probably a better choice.
Next step is to add some "safety" code to go ensure that there is at least one test enabled in each group, even if there are fewer tests than groups, by enabling a test in empty groups. But that's not for this PR.
@jerry-git this is ready to review if you get a chance 🙂
Ahoy! Sorry this took a while, I was on vacation mode 🍹 I'll have a look.
Closing as stale, feel free to reopen 🙂