Different resolution failure messages on different verbosity
- pip version: 21.0.1
- Python version: 3.6 (but 3.7 and 3.8 have the same problem)
- Operating system: Linux (Debian buster docker image)
PIP 21.0.1 sometimes produces wrong error about conflicts , and it produces different (correct!) error when -vvvv options are added.
This problem originated with https://github.com/apache/airflow/issues/15463 (you can see history of it there). We have quite complex dependencies in Airlfow and we are still recommending people to install airflow with PIP 20.2.4, but we are hoping to get rid of that limitation, one problem however was a very strange one and we did not have time to look at it - but when I looked today I realized that the error printed by PIP was misleading (as I could not see the reason for the original error).
I believe PIP instead of pyarrow reports google-cloud-bigquery-storage as having a problem. Looks like instead of printing the actual dependency that has a problem, it prints the "sibling" of that dependency (or smth like that).
It is very easily reproducible:
- Run:
pip install apache-airflow[google]==2.0.2 --constraint https://raw.githubusercontent.com/apache/airflow/constraints-2.0.2/constraints-3.6.txt
You should get an error:
ERROR: Could not find a version that satisfies the requirement google-cloud-bigquery-storage<2.0.0dev,>=1.0.0; extra == "bqstorage" (from google-cloud-bigquery[bqstorage,pandas])
- Run
pip install -vvvv apache-airflow[google]==2.0.2 --constraint https://raw.githubusercontent.com/apache/airflow/constraints-2.0.2/constraints-3.6.txt
You should get an error:
ERROR: Could not find a version that satisfies the requirement pyarrow<2.0dev,>=1.0.0; extra == "bqstorage" (from google-cloud-bigquery[bqstorage,pandas])
ERROR: No matching distribution found for pyarrow<2.0dev,>=1.0.0; extra == "bqstorage"
I believe in both cases we ONLY have problem with pyarrow, and it is misreported without the -vvvv flag. Looks like instead of actual dependency that is wrong (pyarrow), the sibling of that dependency (google-cloud-bigquery-storage) is printed out by PIP. Note that other than the dependency - those are the very same limits which are problematic (<2.0.0dev,>=1.0.0; extra == "bqstorage").
I also could not find any other packages from those being installed where google-cloud-bigquery-storage would be limited to <2.0.0dev,>=1.0.0 - that's why I think this is a bug in PIP.
Gists with the outputs to compare
pip install apache-airflow[google]==2.0.2 --constraint https://raw.githubusercontent.com/apache/airflow/constraints-2.0.2/constraints-3.6.txt :
Here: https://gist.github.com/potiuk/04f6127469a709e3e47be7585c9a863c
pip install -vvvv apache-airflow[google]==2.0.2 --constraint https://raw.githubusercontent.com/apache/airflow/constraints-2.0.2/constraints-3.6.txt:
https://gist.github.com/potiuk/17a3d591fb091bdd8a0e213f49b6b0af
I might be wrong, of course, but it looks like this.
UPDATE:
I run it with -vv and it fails with the 'google-cloud-bigquery-storage` error:
https://gist.github.com/potiuk/2f9af6a8eaac7ea393fd1f9fe64361c7
The -v and -vvv both fail with pyarrow error.
In neither of those I can find where the google-cloud-bigquery-storage<2.0.0dev,>=1.0.0; extra == "bqstorage" (from google-cloud-bigquery[bqstorage,pandas]) comes from :(.
This prompts me to try the same installation on 3.8, which uncovers another conflict(!)
ERROR: Cannot install apache-airflow[google]==2.0.2 because these package versions have conflicting dependencies.
The conflict is caused by:
apache-airflow[google] 2.0.2 depends on cattrs~=1.1; python_version > "3.6"
The user requested (constraint) cattrs==1.0.0
To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict
And 3.6 (against unreleased pip) also presents a different conflict, on connexion. So the issue may be that there are actually multiple conflicts in the current Airflow constraints, and each error is only showing a part of it. google-cloud-bigquery-storage could still be conflicting somewhere. Still, it’s surprising -vvvv would affect the dependency resolution logic, I’ll need to deeper look into this.
Edit: I successfully reproduced this different behaviours on Linux. This is really weird.
EDIT (Apologies for not seeing it first ). I dug a bit deeper, And I think I know where the -storage limitations are coming from (it is in fact in the 1.28.0 version of the bigquery library). Same of pyarrow. This is my bad. I looked at the latest < 3.0.0 version of -biquery not the one from constraints.
The -v behaviour is strange one, that it alternates between those. But they are actually right... I still do not know where the old 1.28.0 limitation comes from (but this is a different story).
I will close that one and look further to where it is coming from.
Apologies for the troubles (but it would be nice to find out the -v behaviour reason :).
Ah, I think I found the real root of conflict. I get this against the main branch:
$ python src/pip install 'apache-airflow[google]==2.0.2' --constraint https://raw.githubusercontent.com/apache/airflow/constraints-2.0.2/constraints-3.6.txt
...
ERROR: Cannot install google-cloud-bigquery[bqstorage,pandas]==1.28.0 because these package versions have conflicting dependencies.
The conflict is caused by:
google-cloud-bigquery[bqstorage,pandas] 1.28.0 depends on pyarrow<2.0dev and >=1.0.0; extra == "bqstorage"
The user requested (constraint) pyarrow==2.0.0
...
So the issue is pyarrow all along, but pip 21.0.1 misidentified the cause to be google-cloud-bigquery since it failed to consider version ranges introduced via constraints. I'm going to write this up as another case fixed by #9300. Thanks for the report, it's a really interesting rabbit hole to dig into!
(p.s. This still does not explain the -vvvv thing.)
And the issue got closed before I can submit 🤣 Issue tracker race condition.
But yeah .. the "main" message is much CLEARER. So I re-open it.
Ok. let me then try to do all my checks with the master version of PIP then
I hope we will soon be able to close all those and successfully move to 21. line in Airflow :)
I'll keep this open regardless of the outcome because the -vvvv thing is still unexplained and probably needs to be looked into. It might not be a bug, but someone needs to look into it.
Above is a snapshot of the constraints-3.6.txt file that caused the issue, for future reproduction. I'm assuming the file hosted in Airflow's repo will be overwritten once you sort out the conflicts.
FYI. Seems that I found the root cause for conflict https://github.com/apache/airflow/pull/15513 :crossed_fingers:
So it turns out the different error message is due to pkg_resources returns dependencies in indeterministic ordering (because internally it uses set to store those). When the ordering is different, the resolver can be sent down to subtrees in different orders, and report different errors if you have multiple conflicts in the dependency graph.
I think we should sort the dependencies somehow (maybe just alphabetically), this would be good for debuggability, if nothing else.
I still think we should be outputting more resolution information based on the verbosity, at the moment it's basically nothing or setting PIP_RESOLVER_DEBUG=1 which pretty much gives too much information for almost anyone.
I'll try and get a PR in the next couple of months.
For completeness, the original issue (different resolution due to set ordering) should no longer be relevant in new Python versions since the new importlib.metadata backend retain the ordering declared in package metadata.