pytest-split Get rid of test front-loading in the duration_based

Should hopefully resolve #26

Jun 22 '21 21:06 sondrelg

Nope, still fails. And if I change the test durations in that test case to all 1 except the last one which gets 100, all of the tests end up in the first group and I get 6 empty groups.

Jun 22 '21 21:06 alexforencich

How would you suggest changing the implementation @alexforencich? I'm not seeing the issue 🙂

Jun 22 '21 21:06 sondrelg

So I did not run the test correctly (have to actually install the package before you can run pytest). After fixing that problem, the test case in the PR passes. However, it fails if I change it to:

    durations = {}
    durations["a"] = 1
    durations["b"] = 1
    durations["c"] = 1
    durations["d"] = 1
    durations["e"] = 1
    durations["f"] = 1
    durations["g"] = 1
    durations["h"] = 100

In this case, only the first two groups are utilized and the rest are empty.

Jun 22 '21 22:06 alexforencich

It seems to me like the goal of this algorithm is to preserve the test ordering. So perhaps a better method would be to compute the cumulative sum into a new array, and then partition that on total_duration*index/slices. Then adjust the indicies so they each correspond to at least one test.

Incidentally, I think you can get slightly better performance out of least_duration by sorting the tests by duration first so they are processed from longest to shortest.

However, the best option might be to do something like this: assign each test an ID number based on the original order, then sort based on duration, use the algorithm in least_duration to group the tests, then sort each group based on the ID. This will preserve the test order, but produce more optimal grouping, and there will be no issue with empty groups unless there are more groups than tests (and in that case, what you can do is enable some random test in all of the otherwise-empty groups).

Jun 22 '21 22:06 alexforencich

So I did not run the test correctly (have to actually install the package before you can run pytest). After fixing that problem, the test case in the PR passes. However, it fails if I change it to:
    durations = {}
    durations["a"] = 1
    durations["b"] = 1
    durations["c"] = 1
    durations["d"] = 1
    durations["e"] = 1
    durations["f"] = 1
    durations["g"] = 1
    durations["h"] = 100
In this case, only the first two groups are utilized and the rest are empty.

That almost seems like a feature, not a bug; but more importantly, is that something that was changed in #14? We discussed fixing the change in behavior and the possibility of guaranteeing that every group has a test in #26, but to be clear - this PR just tries to fix the first point.

Would the test above behave differently in previous versions?

Jun 23 '21 08:06 sondrelg

Well, if it wasn't causing github actions to crash, I wouldn't care as much. But it is, so it's a bug. Looks like yes, the splitting algorithm was rewritten in #14. Looks like the previous version maybe was maybe duplicating tests in this case (not explicitly, but due to a side-effect of how it was written) so there weren't any empty groups, but this wasn't noticed since it didn't cause pytest to exit prematurely.

Jun 23 '21 08:06 alexforencich

Incidentally, I think you can get slightly better performance out of least_duration by sorting the tests by duration first so they are processed from longest to shortest.

@alexforencich you're right, that would be better. I just stuck with the current solution because that preserved relative order, which I thought was important for some. I'm happy to add this at a later stage.

Jun 23 '21 16:06 mbkroese

Out of curiosity, why do people care about ordering? I really only want my test groups to be as even as possible, because the longest group sets the total time of my pipeline.

Jun 23 '21 17:06 sondrelg

Just imagine you work for a company that has a large suite of tests and the corresponding technical debt. The tests already have order dependency (by accident) but it's too big a problem to fix all at once.

Jun 23 '21 17:06 mbkroese

Yep lack of test isolation is the only thing I can think of 🙂

Jun 23 '21 17:06 sondrelg

Well, you still might have ordering dependency issues as you're not going to run all of the tests.

For relative order, you can restore that by assigning monotonically increasing IDs before doing the partitioning, then sort each group by this ID before running the tests.

Jun 23 '21 19:06 alexforencich

Updated the PR now @alexforencich @jerry-git

The duration_based_chunks algo should now have the following properties:

Guarantees that every group is assigned a test (as long as test count > splits)
Maintains order

Jun 26 '21 10:06 sondrelg

Definitely looks a lot better. Still greedy I think, but not sure if there is a nice way of improving that without making it a lot more complicated. At any rate, if you want the most even split, least_duration is probably a better choice.

Next step is to add some "safety" code to go ensure that there is at least one test enabled in each group, even if there are fewer tests than groups, by enabling a test in empty groups. But that's not for this PR.

Jun 29 '21 05:06 alexforencich

@jerry-git this is ready to review if you get a chance 🙂

Jul 09 '21 18:07 sondrelg

Ahoy! Sorry this took a while, I was on vacation mode 🍹 I'll have a look.

Jul 21 '21 10:07 jerry-git

Closing as stale, feel free to reopen 🙂

Apr 12 '23 06:04 jerry-git

pytest-split
pytest-split copied to clipboard

Get rid of test front-loading in the duration_based_chunks algo

pytest-split pytest-split copied to clipboard

Get rid of test front-loading in the duration_based_chunks algo

pytest-split
pytest-split copied to clipboard