rattler feat: total ordering for the dependency provider

This should fix the total ordering issues by rewriting how ordering works for the case where the name, version, build number are the same, lets call this set of candidates X.

The old method violated total ordering, with specifically that though a > b, b > c, actually a < c which was incorrect.

The new method compares the set of dependencies that are shared for X. By looking at the shared set of dependencies and devising a matrix that creates a t total order for comparisons, more details are in the code. Tie breaks in this case are done by comparing the timestamp still.

fixes: #888

Oct 04 '24 11:10 tdejager

Are there any two packages that would have resulted in a non-total order before? That might make a good test case then?

Oct 04 '24 15:10 wolfv

I could try making traits to get the required fields and make a minimal test like the one wolf suggested. But that would delay the merge a bit. Let me know what you think!

Oct 04 '24 17:10 tdejager

It's not possible to make a vector and call sort on it in a test?

Oct 04 '24 17:10 wolfv

It's not possible to make a vector and call sort on it in a test?

I think you might think it was one of the trait implementations, which was not the problem. It needs to touch the code in the sorter functions.

Note that the code that was panicking begore is still being run (and not panicking now). This is essentially what you are proposing, albeit not in minimal form.

Oct 05 '24 06:10 tdejager

I've made most of the changes, I'm not sure if the added complexity of the traits would be worth it. I could work on that if that is needed in a seperate PR that would not block this change, and the homebrew error. If we can think of an easy integration test to add, I could add that for sure.

cc @wolfv @baszalmstra

Oct 05 '24 11:10 tdejager

Perhaps we can make a simpler integration test by loading our sample repodata.json, sorting sprcific packages like libuuid and pytorch and creating some snapshots out of that?

Oct 05 '24 11:10 baszalmstra

Perhaps we can make a simpler integration test by loading our sample repodata.json, sorting sprcific packages like libuuid and pytorch and creating some snapshots out of that?

How is that better than the comparison with libsolv that we are currently doing?

Oct 05 '24 11:10 tdejager

That is testing whether we solve similar which indeed requires sorting but doesnt necessarily hit these edge cases. Besides that I think it would be nice to have a test that highlights how we sort similar packages. Having a snapshot that shows how all pytorch variants are sorted for instance would be quite useful and help guide any future changes.

Oct 05 '24 12:10 baszalmstra

One more final thing: Could you also run the rattler_solve benchmarks to see if we didn't accidentally introduce a performance regression?

Oct 05 '24 12:10 baszalmstra

Thanks makes sense will do both. Let me make those test in main first in a separate PR, we can make sure the snapshots stay the same.

Oct 05 '24 14:10 tdejager

One more final thing: Could you also run the rattler_solve benchmarks to see if we didn't accidentally introduce a performance regression?

Ok, unfortunately, it seems to decrease performance by quite a margin. One example(but it's across the board):

old:

solve quetz/resolvo     time:   [187.60 ms 188.11 ms 188.71 ms]

new:

solve quetz/resolvo     time:   [217.02 ms 219.35 ms 221.94 ms]

I improved the perfomance a tiny bit, but I think we need to try and drop most of the HashMap's and work on index-offsets and pre-allocated Vectors with integer slots directly. @baszalmstra wdyt? Can we look at this on monday?

Caching over solves might improve things as well, at least that's what I would reckon.

Oct 05 '24 18:10 tdejager

rattler rattler copied to clipboard

feat: total ordering for the dependency provider

rattler
rattler copied to clipboard