uv icon indicating copy to clipboard operation
uv copied to clipboard

`uv pip` resolver should be able to backtrack on 403

Open paveldikov opened this issue 1 year ago • 11 comments

Many organisations employ in-band scanner tools on their internal PyPI mirrors, with the aim of preventing the ingress of compromised/non-compliant dependencies. Usually such tools will make their blocking decisions known by way of a HTTP 403 response code at download time.

One of the key pain-points with this approach is that, as new versions of packages come out, they will almost inevitably 403-upon-first-download, as they are yet to be scanned and cleared.

Currently the resolver completely gives up when encountering a 403 code, perhaps on the assumption that this is the usual 'permission denied' meaning, as opposed to a more granular 'not this one, please' meaning. This behaviour is not unique to uv; pip also does this, but uv probably hits this worse than pip thanks to supporting universal resolution etc. Regardless, it is a pretty substantial negative impact to developer experience.

I think there should be a resolver option to allow backtracking when 403 codes are encountered, so that older (but scanned + cleared) versions will be considered.

paveldikov avatar Jul 21 '24 16:07 paveldikov

I believe we do continue and backtrack on 403: https://github.com/astral-sh/uv/blob/12518a01a4a436ba4c8b8cfc51646a253bcc8c6c/crates/uv-client/src/registry_client.rs#L221. Or are you referring to a 403 on the archive download, and not the simple HTTP responses?

charliermarsh avatar Jul 21 '24 16:07 charliermarsh

Yes, 403 on the wheel/sdist download.

paveldikov avatar Jul 21 '24 16:07 paveldikov

Btw, has this request ever been reported to pip? I had a search through the issue tracker and the only issues related to 403 were all SSL issues, which it makes sense not to retry on.

And do you know the mirror software your organization is using? Back when I used to use artifactory the error I got for this sort of situation was a timeout, usually for large wheels as the mirror was still downloading the wheel and wouldn't start providing data to the client until it had finished.

It may be worth making a request to the mirror software to provide a 502, 503, or 504 error rather than a 403 error, as they are general considered retryable, where as, in general, 403 is not usually considered retryable.

In fact, on a 403 response, the spec explicitly says if credentials were provided that the client SHOULD NOT automatically repeat the request with the same credentials.

notatallshaw avatar Jul 22 '24 19:07 notatallshaw

I think the request is is not to retry the request but to try another package version. I think if don't get a 403 when listing packages, it's reasonable for us to try another version after a 403 for a specific archive.

zanieb avatar Jul 22 '24 20:07 zanieb

ah, that makes sense, I guess I hadn't fully groked the scenario.

notatallshaw avatar Jul 22 '24 20:07 notatallshaw

That's exactly it. 403 on archive download should definitely not be re-tried, but it may make sense to move along the next archive on the list.

paveldikov avatar Jul 22 '24 21:07 paveldikov

Are you expecting that it would try all distributions within a version? Or that it would move on to the next version if the archive returned a 403?

charliermarsh avatar Jul 22 '24 23:07 charliermarsh

Interesting one. I was thinking 'next version', but this is primarily because of my specific circumstances. Not sure if they are safe to make for the general case:

  • since the policy engine in this case is a scanner, it would likely make the same decision for other distributions of the same version
  • rather not waste time downloading incompatible distributions (not sure if the compatibility decision can be made without incurring the cost of a download?)
  • avoid consuming sdists to the extent possible (this is probably specific to my org, though, and probably belongs in a separate config item)

paveldikov avatar Jul 23 '24 07:07 paveldikov

FWIW, "next version" would be fairly easy but "next distribution" would be challenging (purely based on implementation details).

charliermarsh avatar Jul 28 '24 00:07 charliermarsh

Yes, I think 'next version' should do it, even if it is a somewhat presumptious heuristic. Whilst it is still possible for it to yield false negatives, it is still fewer false negatives than the current approach.

On a pragmatic level: assuming that the 'first' distribution to be attempted is the most optimal one (closest platform/ABI tag match), then it's really 'best wheel or bust' as far as most users are concerned. The value to be gained from attempting a sub-optimal distribution (even sdist) is probably diminishing returns.

And if it doesn't go far enough (which I doubt, but is a possibility) we could track this as a separate issue?

paveldikov avatar Jul 29 '24 07:07 paveldikov

Just to give an example for the question by notatallshaw

And do you know the mirror software your organization is using?

I've run into it with Sonatype Nexus in combination with Repository Firewall. The symptoms are just as paveldikov reported - the packages are visible and exist, but access is blocked when it comes to pulling the wheels

build logs may only show a 403 error message for quarantined components

https://help.sonatype.com/en/firewall-quarantine.html#firewall-quarantine

kirici avatar Aug 29 '24 08:08 kirici

@paveldikov -- I want to help you with this, since I get the sense it's causing a lot of trouble. In your index, is this happening when we go to fetch the metadata for the distribution?

charliermarsh avatar Feb 16 '25 01:02 charliermarsh

When we resolve, we have to get the metadata from the wheel. If the registry includes .whl.metadata files, we use those; otherwise, we try to use range requests, and if those too aren't supported, we download the wheel. I assume this is happening when we try to do one of those three things. Do you know which?

charliermarsh avatar Feb 16 '25 01:02 charliermarsh

I ran with debug just now and it appears that it's fetching the whole wheel.

I see a warning message about 'range requests not supported for ...; streaming wheel'

The registry doesn't appear to include .whl.metadata files. (The log doesn't have anything particularly telling on that part, so I can't tell more about its reasoning)

paveldikov avatar Feb 17 '25 15:02 paveldikov

Okay great, thanks. That should be enough information for me to go on. Are you able to help test if I put up a PR?

charliermarsh avatar Feb 17 '25 15:02 charliermarsh

Would have a hard time retrieving an arbitrary binary. Is a PyPI pre-release feasible?

paveldikov avatar Feb 17 '25 15:02 paveldikov

Hmm, unfortunately that's not a common practice for us so it would take some work. Maybe I can just ship it and we test against the released binary.

charliermarsh avatar Feb 17 '25 15:02 charliermarsh

Okay, I have something up here: https://github.com/astral-sh/uv/pull/12255

charliermarsh avatar Mar 18 '25 01:03 charliermarsh