pip icon indicating copy to clipboard operation
pip copied to clipboard

Different resolution failure messages on different verbosity

Open potiuk opened this issue 4 years ago • 14 comments

  • pip version: 21.0.1
  • Python version: 3.6 (but 3.7 and 3.8 have the same problem)
  • Operating system: Linux (Debian buster docker image)

PIP 21.0.1 sometimes produces wrong error about conflicts , and it produces different (correct!) error when -vvvv options are added.

This problem originated with https://github.com/apache/airflow/issues/15463 (you can see history of it there). We have quite complex dependencies in Airlfow and we are still recommending people to install airflow with PIP 20.2.4, but we are hoping to get rid of that limitation, one problem however was a very strange one and we did not have time to look at it - but when I looked today I realized that the error printed by PIP was misleading (as I could not see the reason for the original error).

I believe PIP instead of pyarrow reports google-cloud-bigquery-storage as having a problem. Looks like instead of printing the actual dependency that has a problem, it prints the "sibling" of that dependency (or smth like that).

It is very easily reproducible:

  1. Run: pip install apache-airflow[google]==2.0.2 --constraint https://raw.githubusercontent.com/apache/airflow/constraints-2.0.2/constraints-3.6.txt

You should get an error:

ERROR: Could not find a version that satisfies the requirement google-cloud-bigquery-storage<2.0.0dev,>=1.0.0; extra == "bqstorage" (from google-cloud-bigquery[bqstorage,pandas])
  1. Run pip install -vvvv apache-airflow[google]==2.0.2 --constraint https://raw.githubusercontent.com/apache/airflow/constraints-2.0.2/constraints-3.6.txt

You should get an error:

ERROR: Could not find a version that satisfies the requirement pyarrow<2.0dev,>=1.0.0; extra == "bqstorage" (from google-cloud-bigquery[bqstorage,pandas])
ERROR: No matching distribution found for pyarrow<2.0dev,>=1.0.0; extra == "bqstorage"

I believe in both cases we ONLY have problem with pyarrow, and it is misreported without the -vvvv flag. Looks like instead of actual dependency that is wrong (pyarrow), the sibling of that dependency (google-cloud-bigquery-storage) is printed out by PIP. Note that other than the dependency - those are the very same limits which are problematic (<2.0.0dev,>=1.0.0; extra == "bqstorage").

I also could not find any other packages from those being installed where google-cloud-bigquery-storage would be limited to <2.0.0dev,>=1.0.0 - that's why I think this is a bug in PIP.

Gists with the outputs to compare

pip install apache-airflow[google]==2.0.2 --constraint https://raw.githubusercontent.com/apache/airflow/constraints-2.0.2/constraints-3.6.txt :

Here: https://gist.github.com/potiuk/04f6127469a709e3e47be7585c9a863c

pip install -vvvv apache-airflow[google]==2.0.2 --constraint https://raw.githubusercontent.com/apache/airflow/constraints-2.0.2/constraints-3.6.txt:

https://gist.github.com/potiuk/17a3d591fb091bdd8a0e213f49b6b0af

I might be wrong, of course, but it looks like this.

UPDATE:

I run it with -vv and it fails with the 'google-cloud-bigquery-storage` error: https://gist.github.com/potiuk/2f9af6a8eaac7ea393fd1f9fe64361c7

The -v and -vvv both fail with pyarrow error.

In neither of those I can find where the google-cloud-bigquery-storage<2.0.0dev,>=1.0.0; extra == "bqstorage" (from google-cloud-bigquery[bqstorage,pandas]) comes from :(.

potiuk avatar Apr 23 '21 15:04 potiuk

This prompts me to try the same installation on 3.8, which uncovers another conflict(!)

ERROR: Cannot install apache-airflow[google]==2.0.2 because these package versions have conflicting dependencies.

The conflict is caused by:
    apache-airflow[google] 2.0.2 depends on cattrs~=1.1; python_version > "3.6"
    The user requested (constraint) cattrs==1.0.0

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict

And 3.6 (against unreleased pip) also presents a different conflict, on connexion. So the issue may be that there are actually multiple conflicts in the current Airflow constraints, and each error is only showing a part of it. google-cloud-bigquery-storage could still be conflicting somewhere. Still, it’s surprising -vvvv would affect the dependency resolution logic, I’ll need to deeper look into this.

uranusjr avatar Apr 23 '21 17:04 uranusjr

Edit: I successfully reproduced this different behaviours on Linux. This is really weird.

uranusjr avatar Apr 23 '21 17:04 uranusjr

EDIT (Apologies for not seeing it first ). I dug a bit deeper, And I think I know where the -storage limitations are coming from (it is in fact in the 1.28.0 version of the bigquery library). Same of pyarrow. This is my bad. I looked at the latest < 3.0.0 version of -biquery not the one from constraints.

The -v behaviour is strange one, that it alternates between those. But they are actually right... I still do not know where the old 1.28.0 limitation comes from (but this is a different story).

I will close that one and look further to where it is coming from.

Apologies for the troubles (but it would be nice to find out the -v behaviour reason :).

potiuk avatar Apr 23 '21 17:04 potiuk

Ah, I think I found the real root of conflict. I get this against the main branch:

$ python src/pip install 'apache-airflow[google]==2.0.2' --constraint https://raw.githubusercontent.com/apache/airflow/constraints-2.0.2/constraints-3.6.txt
...
ERROR: Cannot install google-cloud-bigquery[bqstorage,pandas]==1.28.0 because these package versions have conflicting dependencies.

The conflict is caused by:
    google-cloud-bigquery[bqstorage,pandas] 1.28.0 depends on pyarrow<2.0dev and >=1.0.0; extra == "bqstorage"
    The user requested (constraint) pyarrow==2.0.0
...

So the issue is pyarrow all along, but pip 21.0.1 misidentified the cause to be google-cloud-bigquery since it failed to consider version ranges introduced via constraints. I'm going to write this up as another case fixed by #9300. Thanks for the report, it's a really interesting rabbit hole to dig into!

(p.s. This still does not explain the -vvvv thing.)

uranusjr avatar Apr 23 '21 17:04 uranusjr

And the issue got closed before I can submit 🤣 Issue tracker race condition.

uranusjr avatar Apr 23 '21 17:04 uranusjr

But yeah .. the "main" message is much CLEARER. So I re-open it.

potiuk avatar Apr 23 '21 17:04 potiuk

Ok. let me then try to do all my checks with the master version of PIP then

potiuk avatar Apr 23 '21 17:04 potiuk

I hope we will soon be able to close all those and successfully move to 21. line in Airflow :)

potiuk avatar Apr 23 '21 17:04 potiuk

I'll keep this open regardless of the outcome because the -vvvv thing is still unexplained and probably needs to be looked into. It might not be a bug, but someone needs to look into it.

uranusjr avatar Apr 23 '21 17:04 uranusjr

constraints-3.6.txt

Above is a snapshot of the constraints-3.6.txt file that caused the issue, for future reproduction. I'm assuming the file hosted in Airflow's repo will be overwritten once you sort out the conflicts.

uranusjr avatar Apr 23 '21 17:04 uranusjr

FYI. Seems that I found the root cause for conflict https://github.com/apache/airflow/pull/15513 :crossed_fingers:

potiuk avatar Apr 24 '21 16:04 potiuk

So it turns out the different error message is due to pkg_resources returns dependencies in indeterministic ordering (because internally it uses set to store those). When the ordering is different, the resolver can be sent down to subtrees in different orders, and report different errors if you have multiple conflicts in the dependency graph.

I think we should sort the dependencies somehow (maybe just alphabetically), this would be good for debuggability, if nothing else.

uranusjr avatar Jul 08 '21 05:07 uranusjr

I still think we should be outputting more resolution information based on the verbosity, at the moment it's basically nothing or setting PIP_RESOLVER_DEBUG=1 which pretty much gives too much information for almost anyone.

I'll try and get a PR in the next couple of months.

notatallshaw avatar Apr 23 '25 21:04 notatallshaw

For completeness, the original issue (different resolution due to set ordering) should no longer be relevant in new Python versions since the new importlib.metadata backend retain the ordering declared in package metadata.

uranusjr avatar Apr 24 '25 09:04 uranusjr