addons
addons copied to clipboard
504 error when trying to upload the source code of a new version
Describe the problem and steps to reproduce it:
(Please include as many details as possible.)
- Go to AMO -dev: https://addons-dev.allizom.org/en-US/developers/addon/submit/upload-listed
- Submit a new add-on (doesn't matter which one as long as it passes validation, you can use a dummy one from https://staticfil.es/webext-generator/)
- On the next page, check that you want to attach source code to the submission
- Select a zip file that is rather big (e.g., I tried with a 50MB zip file)
- Click to continue
What happened?
This error is displayed:
504 ERROR
The request could not be satisfied.
CloudFront attempted to establish a connection with the origin, but either the attempt failed or the origin closed the connection. We can't connect to the server for this app or website at this time. There might be too much traffic or a configuration error. Try again later, or contact the app or website owner.
If you provide content to customers through CloudFront, you can find steps to troubleshoot and help prevent this error by reviewing the CloudFront documentation.
Generated by cloudfront (CloudFront)
Request ID: 0hn0qJOZeCrgZLbhjaPj4mtquw6NDlZ4re9fB70kbvnGcLA8jMBUUw==
What did you expect to happen?
No error.
┆Issue is synchronized with this Jira Task
This issue seems to affect all envs.
@bqbn would you be able to look into that please?
What workflow gets triggered after a user uploads a zip file?
504 means gateway timeout, could it be Nginx timed out while waiting for something to process the zip file, or it didn't time out but just that something failed to process a big zip file?
The endpoint is /{locale}/developers/addon/{slug}/versions/{id}
We open the zip (at this point it should be on the local filesystem, since it's over the threshold Django has to deal with uploads in memory) validate that the files in it and their size are ok, then move the zip to EFS. That hasn't been a problem in the past (up until our limit at least, which should be way above 50 MB)
What about the custom scanner and MAD service? Do those two get invoked after a user uploads a zip file?
No, no tasks are triggered for source uploads - and even if that was the case, tasks like these are triggered async, a 200 response is returned and clients come back to check on it periodically. Here it's failing right at upload.
FWIW: I tried with my local addons-server instance, setting client_max_body_size 200m;
in nginx config (we default to 50M in our local config) and having it use addons-server directly (to ensure it didn't hit addons-frontend), and that worked fine.
It turns out Cloudfront has a default 30 seconds "Origin Response Timeout" [1].
I tried it with a 70 MB zip file and it looks like the app took 98.081 seconds to process the POST.
1634159656.645 "*, 130.176.100.98" US addons-dev.allizom.org "POST /en-US/developers/addon/mossonk-locer-malmorming/submit/source-listed HTTP/1.1" 302 0 "https://addons-dev.allizom.org/en-US/developers/addon/mossonk-locer-malmorming/submit/source-listed" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.71 Safari/537.36" 204.776 98.081 "-"
In this case, the Cloudfront edge server already returned 504 to the browser while Nginx was busy transmitting the file to its upstream.
I think we have a couple of options.
- To increase the value of "Origin Response Timeout". It can be bumped up to 60 seconds, which is probably not enough for some big files. If we need a value larger than 60, we'd have to contact AWS and make a case for it.
- To set
uwsgi_request_buffering
tooff
[2]. This would allow Nginx to start transmitting the file to the app without buffering it first. That way when the browser finishes transmitting the file, the app should receive the file shortly.
Option 2 is Cloud agnostic, so it should probably be the way to go. However it also means this option (uwsgi_request_buffering
) will be applied to all requests sent to the app, so we'll need to test it thoroughly.
That said, if there is a dedicated path prefix for uploading files, then we could make a new location (in Nginx config) for it and only apply uwsgi_request_buffering
to that location. This will make the Nginx config file more complicated (because we'll need an additional named location for this purpose), but it can be our third option nonetheless.
Please let me know what you think.
There are only a few endpoints that do file upload, but there is no dedicated path prefix for them, and some are part of the API so it's quite tricky. uwsgi_request_buffering
to off
seems promising, I'm not sure we need to have it to on
, but as you said it would need thorough testing.
I've set uwsgi_request_buffering
to off
in the dev environment. Please test it and let me know when it is ready for production.
It might be a good idea to have a dedicated path prefix for uploading files in the long run though, so that way it can be better managed/controlled.
@diox other that source code upload and addon submissions, are there any other areas that need to be tested?
@AlexandraMoga just browsing the site normally (particularly devhub) but that will happen anyway over the course of the week.
I've verified add-on and source code uploads of various sizes and I haven't run into 504 errors.
More specifically, I've uploaded a .zip
file of ~55Mb and, even though the upload process was slow, the file was uploaded eventually. I could also download the file afterwards from devhub and rev tools. - see example https://reviewers.addons-dev.allizom.org/en-US/reviewers/review/620865
I've also tried uploading an archive of ~60Mb as source code and this time the upload was not completed. There wasn't a 504 triggered but the upload finished with the following error, which, I believe, is expected:
I haven't run into any other issues while navigating the site, but maybe more coverage will be added here over the course of the week.
Uncompressed size is too large
would happen if the total size of the zip contents, once extracted, was over 200 MB.
FWIW I just got a 504 on dev with a 60MB zip file (the total once extracted was well below the 200 MB limit).
FWIW I just got a 504 on dev with a 60MB zip file (the total once extracted was well below the 200 MB limit).
What info can you provide so that I can search for your session in the Nginx access logs? Can you share your zip file with me so that I can try to re-produce the error?
I tried uploading a 70MB zip file and it succeeded.
I just tried and reproduced the error at 17:43 UTC. URL was https://addons-dev.allizom.org/en-US/developers/addon/goran-carey-leopperre/versions/1692599. I've uploaded the zip to https://drive.google.com/file/d/1c-lRSnIw_bNb07iy4r26SXHSkRWs8bIt/view?usp=sharing
I tried with your zip file, and even though it took 90+ seconds to upload the file, I was able to finish the source uploading step successfully.
There were 4 POST
in my whole session.
"POST /en-US/developers/upload HTTP/1.1" 302 0 "https://addons-dev.allizom.org/en-US/developers/addon/submit/upload-listed" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.81 Safari/537.36" 0.265 0.263 "-"
"POST /en-US/developers/addon/submit/upload-listed HTTP/1.1" 302 0 "https://addons-dev.allizom.org/en-US/developers/addon/submit/upload-listed" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.81 Safari/537.36" 0.641 0.641 "-"
"POST /en-US/developers/addon/seasp-scorse-duckabugo2/submit/source-listed HTTP/1.1" 302 0 "https://addons-dev.allizom.org/en-US/developers/addon/seasp-scorse-duckabugo2/submit/source-listed" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.81 Safari/537.36" 95.080 95.077 "-"
"POST /en-US/developers/addon/seasp-scorse-duckabugo2/submit/details HTTP/1.1" 302 0 "https://addons-dev.allizom.org/en-US/developers/addon/seasp-scorse-duckabugo2/submit/details" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.81 Safari/537.36" 1.213 1.213 "-"
However, the source uploading step for me is to POST to /en-US/developers/addon/<slug>/submit/source-listed
whereas yours is to /en-US/developers/addon/<slug>/versions/<id>
.
I don't know what exact steps to take to re-produce the issue you're seeing. For me, I followed the steps in the first comment and everything seems to be fine.
Interesting. There are 2 ways to submit source: at version/add-on submission time, or afterwards, while editing the version. You did the former, I did the latter. To follow my steps:
- Upload an add-on
- Don't submit source at submission time
- Once submitted, go to
Manage Status & Versions
in developer hub for that add-on - Click on the version
- You should see the edit version form at this point, where you can upload the source
The code to handle the 2 different paths is mostly shared, but there are some differences, so it would be interesting to see if only one of them is broken.
I followed your step and still couldn't reproduce it. This time I even used my bigger zip file (74MB). It took 118 seconds to finish but I did get the "Changes successfully saved." message on the web page.
"POST /en-US/developers/addon/moodpermad-cronker-tigarron/versions/1541784 HTTP/1.1" 302 0 "https://addons-dev.allizom.org/en-US/developers/addon/moodpermad-cronker-tigarron/versions/1541784" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.81 Safari/537.36" 118.420 118.421 "-"
I searched the logs and I see both your attempts got 302 and they only took about 70+ seconds.
"POST /en-US/developers/addon/goran-carey-leopperre/versions/1692599 HTTP/1.1" 302 0 "https://addons-dev.allizom.org/en-US/developers/addon/goran-carey-leopperre/versions/1692599" "Mozilla/5.0 (X11; Linux x86_64; rv:95.0) Gecko/20100101 Firefox/95.0" 73.834 73.834 "-"
"POST /en-US/developers/addon/goran-carey-leopperre/versions/1692599 HTTP/1.1" 302 0 "https://addons-dev.allizom.org/en-US/developers/addon/goran-carey-leopperre/versions/1692599" "Mozilla/5.0 (X11; Linux x86_64; rv:95.0) Gecko/20100101 Firefox/95.0" 75.769 75.769 "-"
I can't think of a reason why 504 happened to you but not to me.
I also reproduced the 504 error now with a 60Mb zip when I've uploaded the file from the devhub manage versions page:
What info can you provide so that I can search for your session in the Nginx access logs?
Also, can you reproduce the 504 constantly, or is it intermittent?
On my side, I tried to upload a 70 MB zip file again, and it worked fine for me.
And in which environment did you test it?
@bqbn I've tested on -dev. I've tried again today with a ~60Mb zip file and one instance of the reproduction happened at 14:03 UTC (Oct 28). The addon in cause is https://addons-dev.allizom.org/en-US/developers/addon/grammarly-for-dev/versions/1692796. Another reproduction was at 14:08 UTC (Oct 28) and another at 14:10 UTC- both consecutive source code uploads for the same addon => so the error reproduced three times in a row without intermittence. I've tried with another add-on after that - https://addons-dev.allizom.org/en-US/developers/addon/ee2fc01910e04967ae5f/versions/1692927 at 14:13 UTC and the 504 error was also reproduced.
At 14:36 UTC I've tried again for https://addons-dev.allizom.org/en-US/developers/addon/rlmt/versions/1692905, this time with a ~54Mb zip file => the 504 error reproduced but when I opened the addon version page again, I've noticed that the source code has actually been uploaded.
Another test, at 14:41 UTC for https://addons-dev.allizom.org/en-US/developers/addon/3f22e7b50ac64eb8a017/versions/1692907, with a 45Mb zip also reproduced the error, again the archive proved to be uploaded when I've checked the version page again.
I've added the zip files I've tested with here - you should have access to the folder with your mozilla account.
@AlexandraMoga if you've been using Firefox for testing, could you try a different browser (e.g.: the latest Chrome)? I'm curious if that would make any difference.
@AlexandraMoga, in the cloudfront logs, I see 2 requests that had the same IP address as yours uploaded ~60 MB file successfully.
date time location method uri status request_bytes time_taken time_to_first_byte x_edge_detailed_result_type
2021-10-28 07:01:49 SFO53-C1 POST /en-US/developers/addon/unlistedexample-16-04/versions/1693053 200 62034917 176.422 176.366 Miss
2021-10-28 07:06:01 SFO53-C1 POST /en-US/developers/addon/mv2-firefox-versions-3/versions/1692880 200 62035016 179.529 179.473 Miss
Those requests took over 175 seconds to finish, but nonetheless it succeeded.
And I can see your failed attempts in Cloudfront logs too,
date time location method uri status request_bytes time_taken time_to_first_byte x_edge_detailed_result_type
2021-10-28 14:02:55 OTP50-C1 POST /en-US/developers/addon/grammarly-for-dev/versions/1692796 504 62034930 55.9 55.9 OriginCommError
2021-10-28 14:08:11 OTP50-C1 POST /en-US/developers/addon/grammarly-for-dev/versions/1692796 504 62034938 57.861 57.861 OriginCommError
2021-10-28 14:10:53 OTP50-C1 POST /en-US/developers/addon/grammarly-for-dev/versions/1692796 504 62034874 55.262 55.262 OriginCommError
2021-10-28 14:14:02 OTP50-C1 POST /en-US/developers/addon/ee2fc01910e04967ae5f/versions/1692927 504 62034956 55.071 55.071 OriginCommError
2021-10-28 14:34:10 OTP50-C1 POST /en-US/developers/addon/grammarly-for-dev/versions/1692796 504 55827395 52.313 52.313 OriginCommError
2021-10-28 14:36:09 OTP50-C1 POST /en-US/developers/addon/rlmt/versions/1692905 504 55827312 52.567 52.567 OriginCommError
2021-10-28 14:41:20 OTP50-C1 POST /en-US/developers/addon/3f22e7b50ac64eb8a017/versions/1692907 504 46515967 49.096 49.096 OriginCommError
In the failed cases, Cloudfront returned 504 much sooner than 175 seconds (even with about the same sized files). Do you think it was possible that you didn't wait long enough for the failed cases?
Another difference is the edge location. Even though all the above requests were from the same IP, the successful ones were from SFO53-C1
, whereas the failed ones were from OTP50-C1
.
If the 2 successful requests were from you, do you remember what you did differently that might cause you to have a different edge server?
And lastly, as far as Nginx is concerned, all requests, even the failed ones, were completed successfully (e.g. Nginx received either 200 or 302 from the application).
And just to report, I tried https://addons-dev.allizom.org/en-US/developers/addon/grammarly-for-dev/versions/1692796 with @AlexandraMoga's testing files. For the ~60Mb zip file, I got the Uncompressed size is too large
error. For the ~54Mb zip file, I was able to successfully upload it. In my case the cloudfront logs look like this:
date time location method uri status request_bytes time_taken time_to_first_byte x_edge_detailed_result_type
2021-10-28 23:37:13 SEA19-C1 POST /en-US/developers/addon/grammarly-for-dev/versions/1692796 200 62034555 86.22 86.158 Miss
2021-10-28 23:40:25 SEA19-C1 POST /en-US/developers/addon/grammarly-for-dev/versions/1692796 302 55827042 87.741 87.741 Miss
I'm using Chrome though. Please give that a try and see if it makes any difference.
@bqbn these are the results of my new tests (Oct 29) with Chrome, on Windows (note that the UTC time mentioned below corresponds with the moment the error appeared in the browser):
- 13:06 UTC - https://addons-dev.allizom.org/en-US/developers/addon/django32/versions/1692875, with the ~60Mb file example => 504 error (I haven't done anything to interrupt the upload; I just waited for the result and the 504 error was triggered after a considerable amount of time - ~1 minute); I've checked the version page after the error was triggered and there was no source code uploaded;
- 13:14 UTC - https://addons-dev.allizom.org/en-US/developers/addon/temporary/versions/1692835, with the ~60Mb file example => 504 error, source code file not uploaded;
- 13:19 UTC - https://addons-dev.allizom.org/en-US/developers/addon/terms-of-service-didn-t-read/versions/1692600, with the ~60Mb file example => 504 error, source code file not uploaded;
- 13:22 UTC - https://addons-dev.allizom.org/en-US/developers/addon/mozilla-labs-snowl/versions/1692165 with the ~50Mb file example => 504 error, source code file was uploaded when I checked the version page again.
Also, to answer your questions:
@AlexandraMoga, in the cloudfront logs, I see 2 requests that had the same IP address as yours uploaded ~60 MB file successfully.
date time location method uri status request_bytes time_taken time_to_first_byte x_edge_detailed_result_type 2021-10-28 07:01:49 SFO53-C1 POST /en-US/developers/addon/unlistedexample-16-04/versions/1693053 200 62034917 176.422 176.366 Miss 2021-10-28 07:06:01 SFO53-C1 POST /en-US/developers/addon/mv2-firefox-versions-3/versions/1692880 200 62035016 179.529 179.473 Miss
Those requests took over 175 seconds to finish, but nonetheless it succeeded.
I've made those uploads before noon (around 07:00 UTC according to the logs) with a different account - an admin user. I've tried to upload source code with that user again on Chrome but this time I ran into the 504 error. So, the only difference between the successful and the failed requests I can currently think about is the submission time (before noon/ afternoon).
In the failed cases, Cloudfront returned 504 much sooner than 175 seconds (even with about the same sized files). Do you think it was possible that you didn't wait long enough for the failed cases?
I didn't touch the page from the moment I've uploaded the zip file until the error was received, so I'm not sure how I could have impacted the response time.
Also, if it has any relevance, I wasn't using any active VPN software while running the above tests.
I can see all your failed attempts on 10/29 were from the OTP50-C1
location.
So far, I think we'd agree that the problem is not browser or user related. And since we were able to upload from some edge locations but not another (i.e. OTP50-C1
), I suspect this has to do with the edge servers. I'll file an AWS support ticket to see if they can help us find out why.
Meanwhile, I'd suggest we have more people help test from different locations so that I can collect logs that contain more different edge locations to send to AWS.
AWS suggested that we increase the "Origin Response Time" to 60 seconds. We have done so in the dev environment. Please run some tests and see if that makes any difference.
Retried testing on -dev (Nov 02):
- 10:44 UTC - https://addons-dev.allizom.org/en-US/developers/addon/something-simple/versions/1692617 with 60Mb example => no 504 timeout; received the "Uncompressed size is too large" error;
- 10:49 UTC - https://addons-dev.allizom.org/en-US/developers/addon/de-something-simple/versions/1692618 with 60Mb example => no 504 timeout; received the "Uncompressed size is too large" error;
- 10:52 UTC - https://addons-dev.allizom.org/en-US/developers/addon/fix-listed-version-sync/versions/1692547 with 50Mb example => no 504 timeout; file successfully uploaded
- 14:57 UTC - https://addons-dev.allizom.org/en-US/developers/addon/something-simple/versions/1692617 with 60Mb example => no 504 timeout; received the "Uncompressed size is too large" error;
@bqbn I couldn't reproduce the error today.