uv icon indicating copy to clipboard operation
uv copied to clipboard

Handle overlapping and incomplete markers when forking

Open konstin opened this issue 1 month ago • 0 comments

In python, a package can declare conflicting requirements as long as the markers are disjoint, e.g.:

numpy==2.0.0; python_version >= "3.11"
numpy==1.26.4; python_version < "3.11"

When we see requirements like that, we split the resolution into one for python_version >= "3.11" and one for python_version < "3.11" and solve each fork individually. This fails when the markers are overlapping or don't cover the entirety of the marker space. For a real-world example, opencv-python 4.10.0.84 used by transformers:

numpy >=1.13.3 ; python_version < "3.7"
numpy >=1.21.0 ; python_version <= "3.9" and platform_system == "Darwin" and platform_machine == "arm64"
numpy >=1.21.2 ; python_version >= "3.10"
numpy >=1.21.4 ; python_version >= "3.10" and platform_system == "Darwin"
numpy >=1.23.5 ; python_version >= "3.11"
numpy >=1.26.0 ; python_version >= "3.12"
numpy >=1.19.3 ; python_version >= "3.6" and platform_system == "Linux" and platform_machine == "aarch64"
numpy >=1.17.0 ; python_version >= "3.7"
numpy >=1.17.3 ; python_version >= "3.8"
numpy >=1.19.3 ; python_version >= "3.9"

Currently, we solve for the following forks (https://github.com/astral-sh/uv/issues/4687):

python_version >= '3.12' and platform_machine == 'aarch64' and platform_system == 'Darwin' and platform_system == 'Linux'
python_version < '3.13' and python_version >= '3.12' and platform_machine == 'aarch64' and platform_system == 'Darwin' and platform_system == 'Linux'
python_version >= '3.13' and platform_machine == 'aarch64' and platform_system == 'Darwin' and platform_system == 'Linux'
python_version == '3.9' and platform_machine == 'arm64' and platform_system == 'Darwin'

They are disjoint, but they are also missing a good bit of the marker space for which we never solve.


Let's give working with markers some math-y foundations, a visualization as mental model and let's look at two examples.

Lemma: Inflating the requirements by artificially introducing a split. A requirement

foo

is equivalent to

foo; X
foo; not(X)

Similarly,

foo; Y

is equivalent to

foo; Y and X
foo; Y and not(X)

We can extend this to an arbitrary number of arguments, so if Z¹, Z², ..., Z^n are pairwise disjoint and the union over i in 0..n for Z^i is the base set omega, then

foo; Y

is equivalent to

foo; Y and Z¹
foo; Y and Z²
...
foo; Y and Z^n

Now assume we have the following set of partially pairwise overlapping (https://github.com/astral-sh/uv/issues/4640), partially pairwise incomplete (https://github.com/astral-sh/uv/issues/4687) markers:

foo¹>2
foo²>3; x>=5
foo³<4; x<7

We can now inflate the requirements by artificially introducing a split:

foo¹>2; (x>=7)
foo²>3; x>=5 and (x>=7)
foo³<4; x<7 and (x>=7)

foo¹>2; (x<7 and x>=5)
foo²>3; x>=5 and (x<7 and x>=5)
foo³<4; x<7 and (x<7 and x>=5)

foo¹>2; (x<5)
foo²>3; x>=5 and (x<5)
foo³<4; x<7 and (x<5)

We see that each group is disjoint, so we can create three forks. Simplifying and Removing empty markers we get:

# Fork 1
foo¹>2; x>=7
foo²>3; x>=7

# Fork 2
foo¹>2; x<7 and x>=5
foo²>3; x<7 and x>=5
foo³<4; x<7 and x>=5

# Fork 3
foo¹>2; x<5
foo³<4; x<5

We defined a correct way of splitting these markers.


Let us switch to another example to develop a model of markers and marker space. Ignoring extras, we can consider the current environment a point in a 10-dimensional marker space defined by PEP 508 (os_name, sys_platform, platform_machine, platform_python_implementation, platform_release, platform_system, platform_version, python_version/python_full_version, implementation_name, implementation_version). The dimensions are of different datatypes and not all points exist (os_name=="posix" and platform_system=="Windows" doesn't make sense). A marker is a slice in that space.

We can visualize a simpler version of the opencv-python 4.10.0.84 requirements:

numpy >=1.21.4 ; python_version >= "3.10" and platform_system == "Darwin"
numpy >=1.21.2 ; python_version >= "3.11"
numpy >=1.21.0 ; python_version <= "3.9" and platform_system == "Darwin"

image

There is two ways we can transform these requirements with the diagram:

  • Merge overlapping: 1 and 2 are overlapping, so there's a fork 1 or 2. There's a fork 3 and a fork for the remaining space not(1 or 2 or 3)
  • Split overlapping: 1 and 2 are overlapping so there's a fork 1 \ 2, 2 \ 1 and 1 and 2 . There's a fork 3 and a fork for the remaining space not(1 or 2 or 3)

image

For splitting overlapping, the options are:

  • crosses, python_version >= "3.10" and python_version < "3.11" and platform_system == "Darwin" with numpy >=1.21.4
  • stripes, python_version >= "3.11" and platform_system == "Darwin" with numpy >=1.21.2 and numpy >=1.21.4
  • dots, python_version >= "3.11" and platform_system != "Darwin" with numpy >=1.21.2

I favor to splitting the overlapping, it's the more general solution, and the one able to handle complex cases like the opencv-python markers (even though we pick a more recent numpy version for them anyway). If we merge overlapping, we need to special case the universal marker (no marker provided) for https://github.com/astral-sh/uv/issues/4640. In either case, we need to implement marker tree negation to support incomplete markers, and proper simplification for negated marker tree to determine if the markers were actually incomplete or did cover the entire markers space, leaving only an empty set.

konstin avatar Jul 02 '24 16:07 konstin