twine icon indicating copy to clipboard operation
twine copied to clipboard

Artifactory overwrite message changed?

Open marcz63 opened this issue 6 months ago • 13 comments

Is there an existing issue for this?

  • [x] I have searched the existing issues (open and closed), and could not find an existing issue

What keywords did you use to search existing issues?

artifactory

What operating system are you using?

Linux, Windows 11

If you selected 'Other', describe your Operating System here

No response

What version of Python are you running?

Python 3.9 and 3.12

How did you install twine? Did you use your operating system's package manager or pip or something else?

pip install twine

What version of twine do you have installed (include the complete output)

twine version 6.1.0 (keyring: 25.6.0, packaging: 24.1, requests: 2.32.3, requests-toolbelt: 1.0.0, urllib3: 2.2.3, id: 1.5.0, importlib-metadata:
8.5.0)

Which package repository are you using?

pypi.org

Please describe the issue that you are experiencing

Since an Artifactory update, my twine upload --skip-existing command is now returning an error in case the artifact already exists.

With --verbose option I get a Not enough permissions to delete/overwrite all artifacts under.

I see in the code of twine that it is checking against a overwrite artifact message (note the all difference).

So I am guessing there was a change of behaviour has been made in Artifactory and twine should follow maybe?

Please list the steps required to reproduce this behaviour

  1. have an Artifactory instance?
  2. create a venv
  3. pip install latest twine
  4. twine upload --skip-existing some python package already present in artifactory! (or do it twice ;)
  5. you get a return code not 0

Anything else you'd like to mention?

Thanks for your help :)

marcz63 avatar Jun 17 '25 07:06 marcz63

I'm really leaning towards making this PyPI only because the number of times they have broken this while refusing to provide a reasonable status code is way more often than standard library backwards incompatibilities (which last happened with importlib.metadata years ago).

Thoughts @woodruffw ?

sigmavirus24 avatar Jun 17 '25 10:06 sigmavirus24

Having the same issue, opened: https://github.com/pypa/twine/pull/1252

tscheepers avatar Jun 17 '25 12:06 tscheepers

@tscheepers your PR makes this so generic as to end up skipping and swallowing errors that we shouldn't. This is my point. A company unilaterally changes a message and provides us no really consistent way to detect this problem and we fall back to increasingly worse hacks.

We started supporting these indexes but they consistently break things leading to worse and worse "fixes". This feature was only added because PyPI gives us a consistent behavior and the ability to verify the upload happened at all before trying the upload.

sigmavirus24 avatar Jun 17 '25 13:06 sigmavirus24

If the index implementation changes the returned message, why can't they change the returned status code to something specific? Returning 409 seems the most natural thing to do in this case. I guess that till twine keeps working around the issue, there will not be enough motivation for index maintainers to fix this.

dnicolodi avatar Jun 17 '25 13:06 dnicolodi

I'm really leaning towards making this PyPI only because the number of times they have broken this while refusing to provide a reasonable status code is way more often than standard library backwards incompatibilities (which last happened with importlib.metadata years ago).

Thoughts @woodruffw ?

+1 from me -- the fix is straightforward on Artifactory's side, and IMO they should be incentivized to make it.

(I'm also baffled - like with https://github.com/pypi/warehouse/issues/18275 - how this even landed in production in Artifactory. I can only assume they're not doing any amount of downstream or tool integration testing, despite presumably directing users towards twine for uploads.)

woodruffw avatar Jun 17 '25 14:06 woodruffw

If the index implementation changes the returned message, why can't they change the returned status code to something specific?

All of the users that encounter this first are almost always users at a company with a procurement relationship with the index creators. They have the best leverage to push on this.

I vote for a 5.x release stripping these hacks with a big warning in the changelog. At this point the exploitation of the community for continued revenue on the company's part is unacceptable

sigmavirus24 avatar Jun 17 '25 15:06 sigmavirus24

wow, i was reviewing the code and I have to say I am surprised. let's not blame artifactory for badly written code.

I am somewhat familiar with artifactory for managing the python indexes at my workplace and it works fine. artifactory gives a 403 error when twines tries to upload a package that's already there. 403 permission denied because the repository does not allow to overwrite packages. It's possible to allow overwriting permissions in theory but in practice files must never be replaced in a pypi repo. (pip is caching packages with filename and hash. pip will detect a corrupted cache and refuse to run if a package is modified, pip will never work again until you run "pip cache purge" on all the affected machines.)

twine should never have reached the 403 error. twine shouldn't have tried to upload the package because the package already existed. the issue is with twine. the problem is that "twine --skip-existing" does not work at all. it's not an artifactory issue.

looking at the twine source code:

  1. there is a check to see whether the package is present... but the function package_is_uploaded() doesn't work, it's always returning False because there is no API to check that a package already exists. it's horribly broken code!
  2. then twine uploads the file anyway.
  3. then there is another check skip_upload() to inspect the error response after the fact and try to guess whether the error could have been because the package already existed. it's a giant hack. it's not skipping anything, the upload already happened and failed!

source code https://github.com/pypa/twine/blob/cac723964ba9183fe897c15de19a21bf5c08db10/twine/commands/upload.py#L217C12-L217C23

Image

I can think of a few ways to fix it:

option 1: Remove all this code that's horribly broken and remove that flag --skip-existing that never worked.

option 2: Fix twine to actually check that the package exists. (might be difficult, the comments say there are no API to do that).

option 3: Rewrite this broken function to fail when it doesn't work. it can only work on pypi

# bad code
def package_is_uploaded(
    self, package: package_file.PackageFile, bypass_cache: bool = False
) -> bool:
    # NOTE(sigmavirus24): Not all indices are PyPI and pypi.io doesn't
    # have a similar interface for finding the package versions.
    if not self.url.startswith((LEGACY_PYPI, WAREHOUSE, OLD_WAREHOUSE)):
        return False

# correct code
def package_is_uploaded(
    self, package: package_file.PackageFile, bypass_cache: bool = False
) -> bool:
    # NOTE(sigmavirus24): Not all indices are PyPI and pypi.io doesn't
    # have a similar interface for finding the package versions.
    if not self.url.startswith((LEGACY_PYPI, WAREHOUSE, OLD_WAREHOUSE)):
        raise Exception("cannot verify whether a package exists, the API for --skip-existing is only supported on pypi.org")

morotti avatar Jun 17 '25 18:06 morotti

@morotti thank you for confirming the idea to forbid the use of --skip-existing on things that aren't PyPI

sigmavirus24 avatar Jun 17 '25 19:06 sigmavirus24

the problem is that "twine --skip-existing" does not work at all. it's not an artifactory issue.

It works just fine on PyPI, and it works well enough on other indices that don't change their responses often. The issue here is that Artifactory changed their response, meaning that the hack that's in place there no longer applies.

twine should never have reached the 403 error. twine shouldn't have tried to upload the package because the package already existed.

twine cannot know that it should never have reached that 403, since different indices use different 4xx codes for signalling here.

In this case I agree with making --skip-existing only work for PyPI/TestPyPI, like Ian originally said in https://github.com/pypa/twine/issues/1251#issuecomment-2979900582. So option 3 is what we were already going to do here.

(TL;DR: The entire situation here is extremely unfortunate, and stems ultimately from the fact that there's no standard upload API. That will eventually be addressed, but in the mean time twine is stuck in a hard position between accreting brittle hacks and not supporting various "key" features like --skip-existing when the target index isn't PyPI/TestPyPI. Removing the brittle hacks makes the most sense to me, especially if third-party index providers aren't regression-testing their changes.)

woodruffw avatar Jun 17 '25 19:06 woodruffw

twine cannot know that it should never have reached that 403, since different indices use different 4xx codes for signalling here.

Worse yet. Artifactory didn't return 403 before but returned something else entirely. To @dnicolodi's point, they have already changed the status code once. Behaving the same as anything that actually targets interoperability with PyPI can't be that hard can it? It's not as if PyPI is closed source, or undocumented, or as if they couldn't test tools like twine continuously against their own product.

PyPI (old old pre-warehouse PyPI) returned 400. I argued with Donald about returning 409 instead in Warehouse and we landed on making the transition from Cheeseshop to Warehouse as seamless as possible. If basically one person can maintain backwards compatibility with almost little thought, why can't Artifactory manage the same? I don't like PyPI's 400 personally but I respect it. Artifactory and other indexes seems to be at whatever whim they want this week/month/quarter. Maybe this drives up shareholder value somehow? I don't know. Maybe they're waiting to offer some additional license set to upload to their PyPI endpoints with the same features. It would not surprise me.

sigmavirus24 avatar Jun 18 '25 02:06 sigmavirus24

If basically one person can maintain backwards compatibility with almost little thought, why can't Artifactory manage the same

Artifactory doesn't control user deployment. Even if they adjust their software to be more compatible with pypi/twine, it will take many years before the release is deployed by customers. And we can't entirely blame the customers for that. artifacts registries are typically used 24/7 to store and serve all artifacts generated by a company (for all languages, not just python), it's a major pain to schedule and perform an upgrade.

I agree with both of you that it's nice to go the extra mile to provide the feature and give a nicer error message (skipping existing packages is definitely a nice feature to have :) ). However I think it's not applicable to this use case because the feature is implemented incorrectly and it's unsafe. Considering how dangerous the existing code is (outside of pypi), I think it's sufficient reason to remove the feature.

The code currently doesn't check whether the package exists, it simply uploads the package and look at http errors after the fact (the package_is_uploaded() function returns False when it doesn't detect official pypi).

It's very dangerous. If a package already exists and is overwritten, the hash of the package will change, then pip will detect that the hash of a package has changed and will refuse to run anymore . This will quite literally stop pip install company wide and require to run pip cache purge on every machine and user in the organization (or you need to delete the package as a workaround (note that you can't re-upload the previous version because that would similarly break any systems that cached the newer version (yes I speak from experience))).

There are many pypi registries (artifactory, nexus, gitlab, devpi, etc...) and I don't know how they all behave out of the box on overwrite. Anything that allows to overwrite is up for a bad surprise. (In spite of all the complaining about artifactory, we're all lucky that it has good settings out of the box :D) Even if a python index is protected from overwrite, admin users could overwrite packages by accident anyway because admin permissions can take priority over index permissions.

Thankfully, most (accidental) usage is probably people re-uploading the same package multiple times, which is fine as long as it's the exact same file (same hash).

TL;DR I'm in favor of ditching the feature outside of pypi because it's dangerous. I think this aspect was not considered when it was initially implemented.

morotti avatar Jun 18 '25 13:06 morotti

@morotti you are entirely missing the point.

Besides this, --skip-existing is a feature for many reasons and I have been inclined to deal with the pain. If you have a task uploading an artifact and (for example) your system has multiple steps and a remote cache if you upload some artifacts but not all of them, you can retry the last step to upload them to the registry without keeping additional state about what to upload. Hashes haven't changed, etc. It isn't unsafe.

If you take the most naive interpretation of how someone might use this, then there's some potential for lack of safety. However, PyPI behaves in a reasonable and logical way. Most companies using any kind of private artifact repository use CI to publish artifacts and do not trust local developer machines (and if they don't, they'll probably learn soon enough why that's a glaring security hole). Your safety argument doesn't hold water in the real world.

And most companies are upgrading their artifact repositories faster these days to deal with the massive number of security vulnerabilities in the open source leveraged by those products. That's why months(?) after the last fix for Artifactory landed, we're fixing it again with more people complaining about it than last time.

This isn't anything other than nearly a decade of these companies breaking things for no good reason between releases. If anything makes your customers afraid to upgrade, it is not knowing what will break this time. Not everyone has enough money to fund a team that can do a bunch of integration testing on a new release of something like this and after the first few incidents resulting from trusting untrustworthy vendors with poor or seemingly non-existent QA practices, I don't blame them.

sigmavirus24 avatar Jun 18 '25 14:06 sigmavirus24

Oh I was talking about another point. I totally agree that artifactory should not have changed the error message.

It's an absolutely trivial bug that could have been trivially avoided if artifactory developers had put the bare minimal effort to add a comment to their code.

# DO NOT CHANGE THE ERROR MESSAGE
# The error message is checked by twine and python tools to upload package
# to detect when a package failed to upload because it already existed.
raise javaexception("...")

I will try to dig up my support contact and report back their poor development practice!

morotti avatar Jun 18 '25 15:06 morotti