pubgrub icon indicating copy to clipboard operation
pubgrub copied to clipboard

Query facts from root during resolution

Open konstin opened this issue 1 year ago • 5 comments

Background When uv queries multiple versions of a package in a row, it will start to prefetch many versions. In this, I tried resolving https://github.com/getsentry/sentry/blob/51281a6abd8ff4a93d2cebc04e1d5fc7aa9c4c11/requirements-base.txt, which has sentry-kafka-schemas>=0.1.50. We try a lot of versions (until eventually realizing we went down the wrong path), but we're also trying versions <0.1.50 since for the motivating example (boto/botocore/urllib3), the partial solution would forbid prefetching.

It would be great to have an API to query unchangeable facts about package version, i.e. incompatibilities with root, such as sentry-kafka-schemas>=0.1.50 being always true, to avoid doing the wrong batch prefetching. We can subsequently apply this to our other prefetching, too.

konstin avatar Oct 13 '24 12:10 konstin

Oh I see I think. Basically preloading the resolution with all the root dependencies as incompatibilities. WAIT, isn’t that supposed to be already?

mpizenberg avatar Oct 13 '24 12:10 mpizenberg

For this specific case, I could query the root requirements from our preprocessing. But PubGrub learns more facts during resolution, so it would be great if I could ask the state of what we currently know for certain; it would certainly be the cleaner solution than doing this without pubgrub.

Say we start with foo==1, bar>=2,<5 and some other requirements. From foo=1, we learn bar>=3. We know this an incompatibility with root due to root -> foo==1 -> bar>=3 => root -> bar>=3. We selected some other packages, then we selected a few versions of bar, and we find them conflicting with the partial solution. At this point, I'd like to ask PubGrub "what do we know about bar definitely" and get bar>=3,<5. With this information, I can then prefetch only in this range and speedup our search for a matching bar version in the remaining space.

konstin avatar Oct 13 '24 12:10 konstin

Right, so what we know for sure, at any given time, is the set of incompatibilities inside internal::core::State https://github.com/pubgrub-rs/pubgrub/blob/fe65959d72ecff9ad3a555256053d894fa697e57/src/internal/core.rs#L22

As far as I remember, we haven’t yet worked on optimizing that set of incompatibilities (except removing duplicates maybe?) so these are basically unstructured info.

The accumulated structured info, solved per package, is actually inside the partial solution, and more precisely, inside the PackageAssignments where we keep the intersection of all assignements. https://github.com/pubgrub-rs/pubgrub/blob/fe65959d72ecff9ad3a555256053d894fa697e57/src/internal/partial_solution.rs#L75

So I think what you are interested in an assignments_intersection state that we could know for certain, with only the facts from the accumulated incompatibilities until that time.

mpizenberg avatar Oct 13 '24 13:10 mpizenberg

The only assignments that you can know for sure are those gathered from either the root, or another package where there is no other choice anymore, for example packages picked with exact versions, or that are definitely needed from the root, and for which there is only 1 version choice left. I’m wondering if there could be a way to snapshot somehow these states, and/or if they could just be derived from the reduction work still needed on the set of incompatibilities.

mpizenberg avatar Oct 13 '24 13:10 mpizenberg

Actually, I had forgotten about it because it was framed as an error message improvements, but Jacob did improve the merging of incompats in https://github.com/pubgrub-rs/pubgrub/pull/163, at least in one case. So I guess you can look at merged_incompatibilities instead of raw incompatibilities. I’m still not sure I fully understand your problem, so I’ll wait for what Jacob has to say.

mpizenberg avatar Oct 13 '24 22:10 mpizenberg

Most of what I'm about to say you already know, but I'm talking it out to clarify my own thoughts.

I would love to provide better APIs for querying the incompatibility/partial_solution data during resolution. Along with other heuristics about how resolution is going. (In my case I have packages that represent "features" that I know a priori are going to have singleton requirements on the package they augment, so I would like choose_version straight to the only version that will work.)

All Incompatibilities are unchangeable facts in some literal but unhelpful way. Given that your writing your own resolution loop you can look directly at incompatibilities. Unfortunately, there only indexed by package and most of them are only conditionally relevant. The unchangeable fact that d depends on f or that three versions of q are unavailable are probably not helpful.

PackageAssignments may be more helpful. It is a list of all facts that are relevant, given the decisions that are made so far. You want the facts that are relevant independent of the decisions that have been made so far. Answering the question of which ones are relevant universally exactly correctly is (I think) isomorphic to actually solving the resolution problem. So we can only efficiently answer the question "which ones have we proven to be relevant ..." Which should be the ones that are in decision level 0 (do not depend on any decisions, like the root being required and versions being unavailable) and decision level 1 (the roots direct dependencies and transitive dependencies proven to be reachable from the root). The individual assignments should be sorted by decision level. There may be some corner cases when they're not, but that can be fixed with #257

Eh2406 avatar Oct 14 '24 15:10 Eh2406

I'm looking for querying only the facts that don't depend on the decisions made so far. The background are the following two cases:

We have a pathological but very common case with urllib3, boto3 and botocore. We select a version of urllib3, then have to go through a lot of version of boto3 to find one that is compatible with that version of urllib3. We speed this up from 45s to 2s (cold resolve) by batch prefetching boto3 (https://github.com/astral-sh/uv/pull/2452). Each version of boto3 depends on one version of botocore, so we also need to prefetch botocore, or we have all those versions of boto3, but spend 25s waiting on all the botocore versions (https://github.com/astral-sh/uv/pull/2452, fig. 3). But for any given boto3, only one botocore version remains allowed, we only have compatibility with one botocore version so we need to prefetch incompatible botocore versions.

The other case is sentry: We have sentry-kafka-schemas>=0.1.50 given. We need to prefetch sentry-kafka-schemas (in this case due to an assignment of python-rapidjson we will eventually reject, but still), and due to botocore, we prefetch even below 0.1.50, which is bad.

Instead, I want to ask what we know for sure: For botocore, it's Range::full() (we aren't seeing enough of the graph to know if we need it at all), for sentry-kafka-schemas, it's sentry-kafka-schemas>=0.1.50.

I'm trying to not touch incompatibilities directly, i see this as an implementation detail, it's easy to misuse it and would like to make it private again in our fork; Hence i'm asking if we can make an API specific for these kinds of prefetching asks.

konstin avatar Oct 14 '24 16:10 konstin

I think adding something to PartialSolution like:

    /// Retrieve intersection of terms related to package that will not chage.
    // It is pub for UV
    pub fn unchanging_term_intersection_for_package(
        &self,
        package: &DP::P,
    ) -> Option<&Term<DP::VS>> {
        let pa = self.package_assignments.get(package)?;

        let idx_newer = pa
            .dated_derivations
            .as_slice()
            .partition_point(|dd| dd.decision_level <= DecisionLevel(1));
        let idx = idx_newer.checked_sub(1)?;
        Some(&pa.dated_derivations[idx].accumulated_intersection)
    }

which should be the intersection of all of the incompatibilities that have been proven to be unchangeable. (As I mentioned above, the "has been proven" is doing a lot of work here.) I would also experiment with putting that on top of https://github.com/pubgrub-rs/pubgrub/pull/257 which is more aggressive about declaring things unchangeable.

Eh2406 avatar Oct 15 '24 19:10 Eh2406

Thank you that's perfect!

konstin avatar Oct 16 '24 09:10 konstin

Started using this in https://github.com/astral-sh/pubgrub/pull/32 and https://github.com/astral-sh/uv/pull/8246 :tada:

konstin avatar Oct 17 '24 09:10 konstin