pypicloud icon indicating copy to clipboard operation
pypicloud copied to clipboard

413: Entity too large

Open tomplex opened this issue 4 years ago • 8 comments

Hello,

What's the best way to increase the max size of uploaded packages? Should it be a change to the UWSGI config (e.g. limit-post), or something else?

Thanks!

tomplex avatar Jul 14 '20 16:07 tomplex

uWSGI is the first place I would try. Out of curiosity, how large is the package you're trying to upload? I did some simple tests and wasn't able to get a 413

stevearc avatar Jul 15 '20 02:07 stevearc

I've also run into this with a ~50mb package. In my case it looks like the problem is a 32mb hard limit on Google Cloud Run requests.

leonoverweel avatar Nov 13 '20 16:11 leonoverweel

We got in touch with GCP support about this; for our near-term needs we'll probably end up hosting our PyPI cloud container in our k8s cluster instead of Cloud Run.

However, this part of their reply may be interesting for PyPI Cloud maintainers:

Cloud Run + signed URLs: Now as to 32MB limitation with Cloud Run. I believe that you wouldn't hit this when downloading packages, as PyPI Cloud will not serve these packages directly, but will generate signed URLs and the download is directly from GCS (unless the stream_files option is turned on [2]).

It would make sense to use the same pattern for file uploading as well - I assume this is where you're hitting the 32MB limit. Signed URLs do support this [3] (and AWS has a similar concept with S3 pre-signed URLs). However, this is not currently supported and would require code modification of the PyPI Cloud project.

[2] https://pypicloud.readthedocs.io/en/latest/topics/configuration.html#pypi-stream-files [3] https://cloud.google.com/storage/docs/access-control/signed-urls

Is this signed URLs approach something that could come to PyPI cloud in the future? It'd probably somehow have to integrate with the twine upload util, so I'm not sure if it'd be practical to implement.

leonoverweel avatar Nov 18 '20 09:11 leonoverweel

Is this signed URLs approach something that could come to PyPI cloud in the future?

It looks like for downloading pypicloud already uses signed URLs which could in theory be used for uploads as well.

However I am not sure if this would require large changes to pypicloud itself to integrated this. @stevearc what do you think?

lgeiger avatar Nov 18 '20 11:11 lgeiger

I'm going to assume that making changes to twine or setup.py upload is not on the table. I see two main options:

  1. Create a small command line utility that knows how to talk to pypicloud and upload the results into GCS or S3 or whatever your blob storage is. This is relatively straightforward, but there is some complexity around data consistency. Pypicloud is going to be managing the cache, so it has to decide when to consider the package to "exist". It could optimistically put it in the cache as soon as the CLI requests a signed url, but then it'll have an invalid entry for a short time (and possibly forever if the upload fails). It could require the CLI to report back when the upload is completed, probably the best option though there's also the risk of that piece failing. Or the signed url request could set up some sort of short-lived polling job that updates the cache once it sees the object exists in the blob store.
  2. Make the file upload endpoint redirect to the blob storage url. This would be much easier to use, but I'm not entirely sure it'll work (I've had trouble with S3 signed url redirects in the past). The main problem with this (assuming it works) is the same as above, except with no option to have the client check back in and confirm after the upload succeeds.

Neither one of these would require rewriting much of pypicloud, but it could involve the addition of some potentially complex logic.

stevearc avatar Nov 18 '20 17:11 stevearc

We ended up going for something close to 1. We turned off storage.prepend_hash and now use gsutil to push our wheels there manually and add the required (name, version) metadata; then we refresh the caches using the button in the admin UI (btw, is there a REST call to do this?) and it all works. This'll cover our needs for now - it's not a module we update a lot.

leonoverweel avatar Nov 19 '20 10:11 leonoverweel

Oof, rebuilding the whole cache is a lot of unnecessary work for one package, but if you're not doing it often I guess that's okay. There is an endpoint for this: https://pypicloud.readthedocs.io/en/latest/topics/api.html#get-admin-rebuild

stevearc avatar Nov 23 '20 21:11 stevearc

I ran into the same issue, but it turned out to be an ingress issue. If you use ingress like nginx, make sure that proxy-body size is large enough, see https://kubernetes.github.io/ingress-nginx/user-guide/nginx-configuration/annotations/#custom-max-body-size it's an annotation setting: nginx.ingress.kubernetes.io/proxy-body-size: 100m

if you use another software LB look for the proxy-body-size setting or something similar. hope this helps

jonjesse avatar Mar 29 '22 18:03 jonjesse