uv Apply fork markers at fork time

It would be nice if we could find a way to apply the fork markers at fork time. As-is, we end up solving parts of the graph in a fork that are actually disjoint with the fork, if those dependencies were added prior to forking.

For example:

let pyproject_toml = context.temp_dir.child("pyproject.toml");
pyproject_toml.write_str(
    r#"
    [project]
    name = "project"
    version = "0.1.0"
    requires-python = ">=3.8"
    dependencies = ["anyio==3.7.0 ; python_version > '3.10'", "dep"]

    [tool.uv.sources]
    dep = { path = "./dep" }
    "#,
)?;

let dep = context.temp_dir.child("dep");
dep.child("pyproject.toml").write_str(r#"
    [project]
    name = "dep"
    version = "0.1.0"
    requires-python = ">=3.8"
    dependencies = ["iniconfig < 2 ; python_version >= '3.10'", "iniconfig >= 2 ; python_version < '3.10'"]
"#)?;

In this graph, we end up solving for anyio even in the python_version < '3.10' fork, despite the fact that it gets discarded at the very end, when we create the ResolutionGraph (since, at that point, we and the markers, and they evaluate to false).

We roughly want this:

diff --git a/crates/uv-resolver/src/resolver/mod.rs b/crates/uv-resolver/src/resolver/mod.rs
index 3bb311a00..b1d5cf3c0 100644
--- a/crates/uv-resolver/src/resolver/mod.rs
+++ b/crates/uv-resolver/src/resolver/mod.rs
@@ -2255,6 +2255,8 @@ impl ForkState {
     /// Subset the current markers with the new markers and update the python requirements fields
     /// accordingly.
     fn with_markers(mut self, markers: MarkerTree) -> Self {
+        // Here, we basically need to `and` the markers of each package in the existing state with
+        // the fork markers... and throw out anything that's disjoint?
         let combined_markers = self.markers.and(markers);

         // If the fork contains a narrowed Python requirement, apply it.

But it seems hard to "transform" a PubGrub State in the middle of the resolution. (In other words: we'd want to clone the State, then change some information about the PubGrubPackage nodes, and discard any nodes that are not relevant for the new branch of the solve.) I don't know how we'd reflect the changes everywhere. We'd probably need to consult with @Eh2406.

Aug 17 '24 21:08 charliermarsh

I think this would help with some of the instability problems as well right? Because I think it would help us sync up the behavior between "find a resolution without knowing the fork points up-front" and "find a resolution using these existing forks."

Aug 19 '24 13:08 BurntSushi

Yeah, it should help with some of them.

Aug 19 '24 13:08 charliermarsh

There's a lot of context I missing here. But fundamentally you looped me in about '"transform" a PubGrub State in the middle of the resolution'. This is not exposed in PubGrub's public API because there are lots of ways to shoot yourself in the foot mutating internal data structures. But you already roll your own resolution loop, so you have a lot more knobs available.

Adding Incompatibilities should generally be safe. Practically, there are two mutations that should be safe and easy. (Modulo never having experimented with this in practice. So not knowing where the corner cases are.)

Marking some set of versions as unavailable. Perhaps this could be used to say "within this fork, versions of Python older than x.y.x are no longer available" or "versions that require a Python incompatible with this fork are individually marked as unavailable".
There is a new dependency edge from some package to some range of packages.

For both of these unit_propagation should be able to handle these new incompatibilities even if they exclude a version that has already been selected.

Removing an Incompatibilities, is much more complicated. (Practically, the inverse operations of the two above. A package becoming available, or a dependency edge no longer being required.) I have thought about this problem for a long time, but never had time to dig in and code something. The problem is that more general Incompatibilities may have been derived from ones that you've just removed. I think it likely that there is a linear algorithm to identify all such tainted inductions and decisions. Overall this could still be more efficient than Cloneing before the problematic add, because derived implications of unrelated facts do not need to be reloaded in.

It is also possible that these dependencies whose existence is conditional on what Python version you are using, would more accurately be modeled with more exotic forms of Incompatibilities. But I would definitely need more context to suggest anything tangible. Was any of that helpful?

Aug 26 '24 21:08 Eh2406