pdm icon indicating copy to clipboard operation
pdm copied to clipboard

Resolver Infinite loop on simple version collision

Open carlkibler opened this issue 1 year ago • 6 comments

I read other "infinite loop" bugs (#2633, #2545, #1119, #908) but wanted to point out the general bug of running the resolver through 10k tries when there's a fairly obvious unresolvable conflict.

  • [x] I have searched the issue tracker and believe that this is not a duplicate.

Make sure you run commands with -v flag before pasting the output.

Steps to reproduce

  1. Fresh project.
  2. Add dependencies langchain and gretel-python-client. (file below)

pyproject.toml:

[project]
name = "pdm_resolve"
version = "0.1.0"
description = "Default template for PDM package"
authors = [
    {name = "Person", email = "[email protected]"},
]
dependencies = [
    "langchain",
    "gretel-client",
]
requires-python = "==3.12.*"
readme = "README.md"
license = {text = "MIT"}


[tool.pdm]
distribution = false

Actual behavior

PDM will go into resolution spin up to value of strategy.resolve_max_rounds (default is 10k). Ok. Here's the conflict:

  • gretel specifies pydantic==1.10.17
  • langchain specifies pydantic<3.0.0,>=2.7.4

pip reports the conflict immediately:

langsmith 0.1.86 requires pydantic<3.0.0,>=2.7.4; python_full_version >= "3.12.4", but you have pydantic 1.10.13 which is incompatible. langchain-core 0.2.19 requires pydantic<3.0.0,>=2.7.4; python_full_version >= "3.12.4", but you have pydantic 1.10.13 which is incompatible.

This is great because it's clear and tells me quickly. Having pdm lock work for 10+ minutes and not notice this is a bug to me.

Expected behavior

  1. I would hope this clear conflict doesn't do repeated loops. There is zero solution due to hardcoded version specifications, and the resolver (or logic above that?) should see that and short-circuit further evaluation.
  2. Make user aware of the 10k loop default limit. What is a "normal" amount of loops for a project? 10, 50? 100? I would suggest after 1 minute or 100 resolution attempts, print a message telling the user the resolver will continue trying up to 10k times and how to configure that limit.

I wonder if the default limit should be much lower (50? 100?) and tell users instead "in large projects the value may need to be set higher, and here's how...".

If a reasonable resolver attempts is <50, then make that the default to save time for the vast majority of uses, and tell huge-project users to set that value because they are a special case. It would be more user friendly.

Environment Information

PDM version:
  4.16.1
Python Interpreter:
  /Users/carl/tmp/pdm_resolve/.venv/bin/python (3.12)
Project Root:
  /Users/carl/tmp/pdm_resolve
Local Packages:
  
{
  "implementation_name": "cpython",
  "implementation_version": "3.12.4",
  "os_name": "posix",
  "platform_machine": "arm64",
  "platform_release": "23.6.0",
  "platform_system": "Darwin",
  "platform_version": "Darwin Kernel Version 23.6.0: Sun Jun 30 19:39:43 PDT 2024; root:xnu-10063.140.33~20/RELEASE_ARM64_T6030",
  "python_full_version": "3.12.4",
  "platform_python_implementation": "CPython",
  "python_version": "3.12",
  "sys_platform": "darwin"
}

carlkibler avatar Jul 16 '24 20:07 carlkibler

On my machine the resolution succeeds in 30s, with pydantic==1.10.7 pinned.

frostming avatar Jul 17 '24 02:07 frostming

1min for me (pdm lock -v 60.83s user 0.55s system 75% cpu 1:21.24 total), on Linux. Pydantic 1.10.17 too. Seems like pip stops backtracking earlier.

pawamoy avatar Jul 17 '24 09:07 pawamoy

@carlkibler you make good points though!

Make user aware of the 10k loop default limit.

Yep, could be printed in on each round like pdm.termui: ======== Starting round 61/10000 ========.

What is a "normal" amount of loops for a project? 10, 50? 100? I would suggest after 1 minute or 100 resolution attempts, print a message telling the user the resolver will continue trying up to 10k times and how to configure that limit.

In addition to "round 61/10000", PDM could indeed issue a message in non-verbose mode every couple hundreds rounds.

I wonder if the default limit should be much lower (50? 100?) and tell users instead "in large projects the value may need to be set higher, and here's how...".

Note that it was initially set to 500, and this was generally way too low, so @frostming increased it by a lot. You'll probably find more info by grepping the git logs or PRs on GitHub.

pawamoy avatar Jul 17 '24 09:07 pawamoy

You all are right, which is frustrating! Thanks for trying. Some fun updates for completeness: The gretel-python-client project did a release yesterday 30 minutes after this bug report, changing pydantic's pinned version from 1.10.13 to 1.10.17. I thought maybe that is why you all got different results. Alas, no.

Today:

  • Even pinning previous gretel-python-client version back to the 0.19.2 version active at the time, I can't replicate the behavior, though it happened easily a dozen times in a row on an EC2 linux server and my M3 Macbook pro.
  • I get dependency resolution in 118 loops on my macbook, taking 6m45s real, 2m6s user time. On that same cloud linux server 118 loops also, but the same 1 minute for real and user.
    • the macbook is far more powerful than the little c7g.large EC2 server. Interesting how much longer it takes to iterate. Nothing else heavy is running - just Chrome and a terminal. I'll run this later from home to see if it's an office wifi problem, though if all the metadata is cached it seems unlikely.
  • For my work projects, I have an extra pip location in my ~/.config/pip/pip.conf. But behavior is same with or without it. Just including for completeness.

So I can't replicate yesterday's behavior, which was up in the thousands of resolver attempts. Baffling stuff. I withdraw my specific bug report, until I actually replicate it.

-- I appreciate @pawamoy thinking over the UX suggestions and giving some feedback from history. Printing a message in non-verbose mode every few hundred rounds would be useful I think. So that's my final suggestion.

I am happy to re-craft this into a feature request toward that end, or close this and make a separate feature request. My only goal is to help you all not get pestered by issues like this one.

carlkibler avatar Jul 17 '24 19:07 carlkibler

I thought maybe that is why you all got different results. Alas, no

No, I even used --exclude-newer=2024-07-15 to return to the old days but it also succeeds. There is no essential difference between 1.10.13 and 1.10.17, too.

Printing a message in non-verbose mode every few hundred rounds would be useful I think. So that's my final suggestion.

This sounds good to me.

frostming avatar Jul 18 '24 04:07 frostming

Seems lots of work needed to

Printing a message in non-verbose mode every few hundred rounds

since the iteration is located in resolvelib.

Gnomeek avatar Jul 24 '24 07:07 Gnomeek