poetry icon indicating copy to clipboard operation
poetry copied to clipboard

[optimize] early-ignore dependencies not matching chosen markers

Open maciejskorski opened this issue 3 years ago • 13 comments

Pull Request Check List

Resolves:

#4952 #4670 #6118 #5121

  • [x] Added tests for changed code.
  • [x] Updated documentation for changed code.

maciejskorski avatar Dec 30 '21 11:12 maciejskorski

poetry as of now is slow due to exponentially many configurations caused by conditional logic in downstream packages. In many cases, this can be optimised by pre-filtering them early to only those matching the current or targeted system.

This is helpful for many use-cases, where we can narrow the scope environment markers in advance - for example, the code will be developed, tested and deployed only on linux with python 3.8.

Consider this example on the environment with linux and python 3.8

[tool.poetry]
name = "test"
version = "0.1.0"
description = "demonstrates that poetry triggers overrides to easily (can be early pruned?)"
authors = ["Maciej Skorski <[email protected]>"]

[tool.poetry.dependencies]
python = "~3.8"
pandas = [
    {version="1.3.5",markers="python_version=='3.6'"},
    {version="1.3.2",markers="python_version=='3.8' and sys_platform=='win32'"},
    {version="1.3.1",markers="python_version=='3.8' and sys_platform=='linux'"},
]

As of now poetry prepares several variants which takes 6 overrides.

NOTE: the original proposal used CLI to push constraints. The approach below suggested by @radoering seems better.

This PR extends the pyproject.toml file by the section tool.poetry.target_env

[tool.poetry.target_env]
sys_platform = "linux"
python_version = "3.8"

Then poetry install --dry-run -vv gives the required install without overrides. See another example with Azure packages

maciejskorski avatar Dec 30 '21 22:12 maciejskorski

I played around a bit. On its own it works fine, but I think there are still some things to think about:

  1. Although the option affects locking it is only available for the install command. (If there is already a lock file using the option makes no difference because the lock file is used. The option has only an effect if there is no lock file.) Maybe, the option should be available for both commands (install and lock)? But that's just a detail.
  2. My main concern is that I cannot tell if a lock file has been created with or without a marker filter. Imagine the following situation: There is no lock file. Developer A (on Linux) uses install with a marker filter for sys_platform, thus, creates a lock file only suitable for Linux and commits/pushes the lock file. Developer B (on Windows) pulls and installs implicitly using the lock file not suitable for his system. Probably, there will be some missing dependencies in his virtual environment! In my opinion there has to be an entry in the lock file telling which marker filters have been used. Further, poetry had to check if a lock file is not suitable for the current platform and inform the user.
  3. I am not sure, if the marker filter should be an option of a command at all or if it should be some new setting in the pyproject.toml. Your use case says the code will be developed, tested and deployed only on linux with python 3.8.. Thus, shouldn't these restrictions to Linux and Python 3.8 be defined in the pyproject.toml, actually? The python constraint is already specified in the pyproject.toml so there is no need to define a marker filter for the python version (at least for this use case). (Maybe, there are some issues like #4958 or #4959 but I think these can be resolved independently from this PR). Maybe, a possibility to specify other environment constraints for the project (e.g. for sys_platform) in the pyproject.toml would be useful. Sounds like a common use case. I am wondering if there is already a PEP or something covering this?

radoering avatar Dec 31 '21 13:12 radoering

Thanks for getting back with valuable comments. Definitely got your point regarding 2) and 3).

Note that poetry is already not secured against such cases if downstream packages don't cover environment markers exhaustively (for example, if a package puts only linux markers poetry will match this lock against my windows and will not install anything...). This PR was inspired by use-cases of developers who work on the similar code in similar environments and so can narrow the system combinations. Also, the logic of python constraints is sometimes broken #3444 so this is one more use case.

I updated the PR accordingly. Constraints are moved to a dedicated section in the toml file.

This should mitigate your concerns 2) and 3) I suppose, as the rules would be transparent. In addition:

  • we can extend (in theory) the logic allowing for 'linux or windows' and whatever will be parsable.
  • computing the lock this way would be more flexible and the current-system-agnostic (in poetry's spirit, which builds first all variants into the lock, and compares against the current system once the lock is ready)

maciejskorski avatar Dec 31 '21 15:12 maciejskorski

I think this is getting a lot closer to a useful/mergable state. Putting the environment constraints into the pyproject is absolutely the right way to go here. There is a bit more work to be done, and I'm thinking this should go into 1.3 instead of 1.2, but I think this will be a useful addition.

neersighted avatar May 18 '22 03:05 neersighted

+1 for this. We have a big project in our team, each time poetry solves the dependencies it already takes more than 10 minutes (each try). However, we know beforehand the version we want as well as the platform, so retries to have a solution that valid everywhere is redundant and a big waste of resources. Currently in our project we have to retries around 20 times, making the whole process super slow, sometimes can be upto 4 hours.

hungbie avatar Jun 02 '22 02:06 hungbie

@neersighted apologizes for the absence—I overlooked the notifications from this topic. If there is still interest in this, I am ready to work on this in upcoming days.

maciejskorski avatar Aug 12 '22 08:08 maciejskorski

@maciejskorski This is definitely something worth revisiting now that we have 1.2 out -- though I must say I have limited bandwidth to help get this over the finish line right now. @radoering or @finswimmer might be good maintainers to work with if they are available...

(sorry for throwing y'all under the bus!)

neersighted avatar Sep 06 '22 15:09 neersighted

First things first. Let's talk about the representation in pyproject.toml.

Current proposal is something like:

[tool.poetry.target_env]
sys_platform = "linux"
platform_system = Linux"

It might be useful to support multiple platforms. There are at least two alternatives for that:

Lists:

[tool.poetry.target_env]
sys_platform = ["linux", "win32"]
platform_system = ["Linux", "Windows"]

Our (afaik at least partially internal) constraint syntax:

[tool.poetry.target_env]
sys_platform = "linux || win32"
platform_system = "Linux || Windows"

Since our marker handling has improved a lot since this PR was created, an alternative design specifying a PEP 508 compliant environment marker that describes the supported environment, might be possible, e.g.:

[tool.poetry]
target_env = 'sys_platform = "linux" and platform_system = "Linux"'

This approach is more powerful (but also more verbose). You can define conditions that are not possible with the original (and extended) approach, e.g.

target_env = 'sys_platform = "linux" or (sys_platform = "win32 and python_version > "3.7")'

However, I wonder if such use cases even exist. Any thoughts which representation should be chosen?

radoering avatar Sep 10 '22 15:09 radoering

I could believe that there are use cases for wanting to express complex top-level markers eg see #6447 in which a user was battling this

curl -s https://pypi.org/pypi/opencv-python/4.5.5.62/json | jq '.info.requires_dist'
[
  "numpy (>=1.13.3) ; python_version < \"3.7\"",
  "numpy (>=1.21.2) ; python_version >= \"3.10\"",
  "numpy (>=1.21.2) ; python_version >= \"3.6\" and platform_system == \"Darwin\" and platform_machine == \"arm64\"",
  "numpy (>=1.19.3) ; python_version >= \"3.6\" and platform_system == \"Linux\" and platform_machine == \"aarch64\"",
  "numpy (>=1.14.5) ; python_version >= \"3.7\"",
  "numpy (>=1.17.3) ; python_version >= \"3.8\"",
  "numpy (>=1.19.3) ; python_version >= \"3.9\""
]

As for how this gets written in pyproject.toml: markers are a bit ugly but the users who are using this feature are likely to be sophisticated enough that they can take it.

dimbleby avatar Sep 10 '22 16:09 dimbleby

Since our marker handling has improved a lot since this PR was created, an alternative design specifying a PEP 508 compliant environment marker that describes the supported environment, might be possible, e.g.:

This is something I would be keen to have and implement. However, @radoering, are we absolutely sure that the current marker implementation is mature and correct? Could you advise me on some examples where complex logic (like nested and/or) are tested?

maciejskorski avatar Nov 10 '22 08:11 maciejskorski

I'm quite confident in the current implementation. It's possible that very complex expressions are not recognized to be empty, but I think it's good enough for real world use cases. The tests are in poetry-core in test_marker.py, e.g. https://github.com/python-poetry/poetry-core/blob/b0b18230a5eeabe16d0000c67dd40d15a3d8f0e5/tests/version/test_markers.py#L412-L432

radoering avatar Nov 10 '22 16:11 radoering

Just wondering what the status of this was?

themanifold avatar Mar 23 '23 10:03 themanifold

I'd say it is stalled. Maybe, I'll pick this up myself sometime in the future, but at the moment there are many other topics with a higher priority on my list. Thus, nothing to be expected from me here in the next few months.

radoering avatar Mar 25 '23 17:03 radoering