addons icon indicating copy to clipboard operation
addons copied to clipboard

504 error when trying to upload the source code of a new version

Open willdurand opened this issue 3 years ago • 57 comments

Describe the problem and steps to reproduce it:

(Please include as many details as possible.)

  1. Go to AMO -dev: https://addons-dev.allizom.org/en-US/developers/addon/submit/upload-listed
  2. Submit a new add-on (doesn't matter which one as long as it passes validation, you can use a dummy one from https://staticfil.es/webext-generator/)
  3. On the next page, check that you want to attach source code to the submission
  4. Select a zip file that is rather big (e.g., I tried with a 50MB zip file)
  5. Click to continue

What happened?

This error is displayed:

504 ERROR
The request could not be satisfied.

CloudFront attempted to establish a connection with the origin, but either the attempt failed or the origin closed the connection. We can't connect to the server for this app or website at this time. There might be too much traffic or a configuration error. Try again later, or contact the app or website owner.
If you provide content to customers through CloudFront, you can find steps to troubleshoot and help prevent this error by reviewing the CloudFront documentation.

Generated by cloudfront (CloudFront)
Request ID: 0hn0qJOZeCrgZLbhjaPj4mtquw6NDlZ4re9fB70kbvnGcLA8jMBUUw==

What did you expect to happen?

No error.

┆Issue is synchronized with this Jira Task

willdurand avatar Oct 13 '21 10:10 willdurand

This issue seems to affect all envs.

willdurand avatar Oct 13 '21 10:10 willdurand

@bqbn would you be able to look into that please?

willdurand avatar Oct 13 '21 10:10 willdurand

What workflow gets triggered after a user uploads a zip file?

504 means gateway timeout, could it be Nginx timed out while waiting for something to process the zip file, or it didn't time out but just that something failed to process a big zip file?

bqbn avatar Oct 13 '21 17:10 bqbn

The endpoint is /{locale}/developers/addon/{slug}/versions/{id}

We open the zip (at this point it should be on the local filesystem, since it's over the threshold Django has to deal with uploads in memory) validate that the files in it and their size are ok, then move the zip to EFS. That hasn't been a problem in the past (up until our limit at least, which should be way above 50 MB)

diox avatar Oct 13 '21 19:10 diox

What about the custom scanner and MAD service? Do those two get invoked after a user uploads a zip file?

bqbn avatar Oct 13 '21 19:10 bqbn

No, no tasks are triggered for source uploads - and even if that was the case, tasks like these are triggered async, a 200 response is returned and clients come back to check on it periodically. Here it's failing right at upload.

diox avatar Oct 13 '21 19:10 diox

FWIW: I tried with my local addons-server instance, setting client_max_body_size 200m; in nginx config (we default to 50M in our local config) and having it use addons-server directly (to ensure it didn't hit addons-frontend), and that worked fine.

diox avatar Oct 13 '21 19:10 diox

It turns out Cloudfront has a default 30 seconds "Origin Response Timeout" [1].

I tried it with a 70 MB zip file and it looks like the app took 98.081 seconds to process the POST.

1634159656.645 "*, 130.176.100.98" US addons-dev.allizom.org "POST /en-US/developers/addon/mossonk-locer-malmorming/submit/source-listed HTTP/1.1" 302 0 "https://addons-dev.allizom.org/en-US/developers/addon/mossonk-locer-malmorming/submit/source-listed" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.71 Safari/537.36" 204.776 98.081 "-"

In this case, the Cloudfront edge server already returned 504 to the browser while Nginx was busy transmitting the file to its upstream.

I think we have a couple of options.

  1. To increase the value of "Origin Response Timeout". It can be bumped up to 60 seconds, which is probably not enough for some big files. If we need a value larger than 60, we'd have to contact AWS and make a case for it.
  2. To set uwsgi_request_buffering to off [2]. This would allow Nginx to start transmitting the file to the app without buffering it first. That way when the browser finishes transmitting the file, the app should receive the file shortly.

Option 2 is Cloud agnostic, so it should probably be the way to go. However it also means this option (uwsgi_request_buffering) will be applied to all requests sent to the app, so we'll need to test it thoroughly.

That said, if there is a dedicated path prefix for uploading files, then we could make a new location (in Nginx config) for it and only apply uwsgi_request_buffering to that location. This will make the Nginx config file more complicated (because we'll need an additional named location for this purpose), but it can be our third option nonetheless.

Please let me know what you think.

bqbn avatar Oct 14 '21 16:10 bqbn

There are only a few endpoints that do file upload, but there is no dedicated path prefix for them, and some are part of the API so it's quite tricky. uwsgi_request_buffering to off seems promising, I'm not sure we need to have it to on, but as you said it would need thorough testing.

diox avatar Oct 18 '21 20:10 diox

I've set uwsgi_request_buffering to off in the dev environment. Please test it and let me know when it is ready for production.

It might be a good idea to have a dedicated path prefix for uploading files in the long run though, so that way it can be better managed/controlled.

bqbn avatar Oct 19 '21 00:10 bqbn

@diox other that source code upload and addon submissions, are there any other areas that need to be tested?

AlexandraMoga avatar Oct 19 '21 07:10 AlexandraMoga

@AlexandraMoga just browsing the site normally (particularly devhub) but that will happen anyway over the course of the week.

diox avatar Oct 19 '21 10:10 diox

I've verified add-on and source code uploads of various sizes and I haven't run into 504 errors.

More specifically, I've uploaded a .zip file of ~55Mb and, even though the upload process was slow, the file was uploaded eventually. I could also download the file afterwards from devhub and rev tools. - see example https://reviewers.addons-dev.allizom.org/en-US/reviewers/review/620865

I've also tried uploading an archive of ~60Mb as source code and this time the upload was not completed. There wasn't a 504 triggered but the upload finished with the following error, which, I believe, is expected: image

I haven't run into any other issues while navigating the site, but maybe more coverage will be added here over the course of the week.

AlexandraMoga avatar Oct 19 '21 14:10 AlexandraMoga

Uncompressed size is too large would happen if the total size of the zip contents, once extracted, was over 200 MB.

FWIW I just got a 504 on dev with a 60MB zip file (the total once extracted was well below the 200 MB limit).

diox avatar Oct 20 '21 09:10 diox

FWIW I just got a 504 on dev with a 60MB zip file (the total once extracted was well below the 200 MB limit).

What info can you provide so that I can search for your session in the Nginx access logs? Can you share your zip file with me so that I can try to re-produce the error?

I tried uploading a 70MB zip file and it succeeded.

bqbn avatar Oct 20 '21 16:10 bqbn

I just tried and reproduced the error at 17:43 UTC. URL was https://addons-dev.allizom.org/en-US/developers/addon/goran-carey-leopperre/versions/1692599. I've uploaded the zip to https://drive.google.com/file/d/1c-lRSnIw_bNb07iy4r26SXHSkRWs8bIt/view?usp=sharing

diox avatar Oct 20 '21 17:10 diox

I tried with your zip file, and even though it took 90+ seconds to upload the file, I was able to finish the source uploading step successfully.

There were 4 POST in my whole session.

"POST /en-US/developers/upload HTTP/1.1" 302 0 "https://addons-dev.allizom.org/en-US/developers/addon/submit/upload-listed" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.81 Safari/537.36" 0.265 0.263 "-"
"POST /en-US/developers/addon/submit/upload-listed HTTP/1.1" 302 0 "https://addons-dev.allizom.org/en-US/developers/addon/submit/upload-listed" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.81 Safari/537.36" 0.641 0.641 "-"
"POST /en-US/developers/addon/seasp-scorse-duckabugo2/submit/source-listed HTTP/1.1" 302 0 "https://addons-dev.allizom.org/en-US/developers/addon/seasp-scorse-duckabugo2/submit/source-listed" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.81 Safari/537.36" 95.080 95.077 "-"
"POST /en-US/developers/addon/seasp-scorse-duckabugo2/submit/details HTTP/1.1" 302 0 "https://addons-dev.allizom.org/en-US/developers/addon/seasp-scorse-duckabugo2/submit/details" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.81 Safari/537.36" 1.213 1.213 "-"

However, the source uploading step for me is to POST to /en-US/developers/addon/<slug>/submit/source-listed whereas yours is to /en-US/developers/addon/<slug>/versions/<id>.

I don't know what exact steps to take to re-produce the issue you're seeing. For me, I followed the steps in the first comment and everything seems to be fine.

bqbn avatar Oct 20 '21 19:10 bqbn

Interesting. There are 2 ways to submit source: at version/add-on submission time, or afterwards, while editing the version. You did the former, I did the latter. To follow my steps:

  • Upload an add-on
  • Don't submit source at submission time
  • Once submitted, go to Manage Status & Versions in developer hub for that add-on
  • Click on the version
  • You should see the edit version form at this point, where you can upload the source

The code to handle the 2 different paths is mostly shared, but there are some differences, so it would be interesting to see if only one of them is broken.

diox avatar Oct 20 '21 22:10 diox

I followed your step and still couldn't reproduce it. This time I even used my bigger zip file (74MB). It took 118 seconds to finish but I did get the "Changes successfully saved." message on the web page.

"POST /en-US/developers/addon/moodpermad-cronker-tigarron/versions/1541784 HTTP/1.1" 302 0 "https://addons-dev.allizom.org/en-US/developers/addon/moodpermad-cronker-tigarron/versions/1541784" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.81 Safari/537.36" 118.420 118.421 "-"

I searched the logs and I see both your attempts got 302 and they only took about 70+ seconds.

"POST /en-US/developers/addon/goran-carey-leopperre/versions/1692599 HTTP/1.1" 302 0 "https://addons-dev.allizom.org/en-US/developers/addon/goran-carey-leopperre/versions/1692599" "Mozilla/5.0 (X11; Linux x86_64; rv:95.0) Gecko/20100101 Firefox/95.0" 73.834 73.834 "-"
"POST /en-US/developers/addon/goran-carey-leopperre/versions/1692599 HTTP/1.1" 302 0 "https://addons-dev.allizom.org/en-US/developers/addon/goran-carey-leopperre/versions/1692599" "Mozilla/5.0 (X11; Linux x86_64; rv:95.0) Gecko/20100101 Firefox/95.0" 75.769 75.769 "-"

I can't think of a reason why 504 happened to you but not to me.

bqbn avatar Oct 20 '21 22:10 bqbn

I also reproduced the 504 error now with a 60Mb zip when I've uploaded the file from the devhub manage versions page:

image

AlexandraMoga avatar Oct 27 '21 09:10 AlexandraMoga

What info can you provide so that I can search for your session in the Nginx access logs?

Also, can you reproduce the 504 constantly, or is it intermittent?

On my side, I tried to upload a 70 MB zip file again, and it worked fine for me.

bqbn avatar Oct 27 '21 16:10 bqbn

And in which environment did you test it?

bqbn avatar Oct 27 '21 17:10 bqbn

@bqbn I've tested on -dev. I've tried again today with a ~60Mb zip file and one instance of the reproduction happened at 14:03 UTC (Oct 28). The addon in cause is https://addons-dev.allizom.org/en-US/developers/addon/grammarly-for-dev/versions/1692796. Another reproduction was at 14:08 UTC (Oct 28) and another at 14:10 UTC- both consecutive source code uploads for the same addon => so the error reproduced three times in a row without intermittence. I've tried with another add-on after that - https://addons-dev.allizom.org/en-US/developers/addon/ee2fc01910e04967ae5f/versions/1692927 at 14:13 UTC and the 504 error was also reproduced.

At 14:36 UTC I've tried again for https://addons-dev.allizom.org/en-US/developers/addon/rlmt/versions/1692905, this time with a ~54Mb zip file => the 504 error reproduced but when I opened the addon version page again, I've noticed that the source code has actually been uploaded.

Another test, at 14:41 UTC for https://addons-dev.allizom.org/en-US/developers/addon/3f22e7b50ac64eb8a017/versions/1692907, with a 45Mb zip also reproduced the error, again the archive proved to be uploaded when I've checked the version page again.

I've added the zip files I've tested with here - you should have access to the folder with your mozilla account.

AlexandraMoga avatar Oct 28 '21 14:10 AlexandraMoga

@AlexandraMoga if you've been using Firefox for testing, could you try a different browser (e.g.: the latest Chrome)? I'm curious if that would make any difference.

bqbn avatar Oct 28 '21 15:10 bqbn

@AlexandraMoga, in the cloudfront logs, I see 2 requests that had the same IP address as yours uploaded ~60 MB file successfully.

date    time    location    method    uri    status    request_bytes    time_taken    time_to_first_byte    x_edge_detailed_result_type
2021-10-28    07:01:49    SFO53-C1    POST    /en-US/developers/addon/unlistedexample-16-04/versions/1693053    200    62034917    176.422    176.366    Miss
2021-10-28    07:06:01    SFO53-C1    POST    /en-US/developers/addon/mv2-firefox-versions-3/versions/1692880    200    62035016    179.529    179.473    Miss

Those requests took over 175 seconds to finish, but nonetheless it succeeded.

And I can see your failed attempts in Cloudfront logs too,

date    time    location    method    uri    status    request_bytes    time_taken    time_to_first_byte    x_edge_detailed_result_type
2021-10-28    14:02:55    OTP50-C1    POST    /en-US/developers/addon/grammarly-for-dev/versions/1692796    504    62034930    55.9    55.9    OriginCommError
2021-10-28    14:08:11    OTP50-C1    POST    /en-US/developers/addon/grammarly-for-dev/versions/1692796    504    62034938    57.861    57.861    OriginCommError
2021-10-28    14:10:53    OTP50-C1    POST    /en-US/developers/addon/grammarly-for-dev/versions/1692796    504    62034874    55.262    55.262    OriginCommError
2021-10-28    14:14:02    OTP50-C1    POST    /en-US/developers/addon/ee2fc01910e04967ae5f/versions/1692927    504    62034956    55.071    55.071    OriginCommError
2021-10-28    14:34:10    OTP50-C1    POST    /en-US/developers/addon/grammarly-for-dev/versions/1692796    504    55827395    52.313    52.313    OriginCommError
2021-10-28    14:36:09    OTP50-C1    POST    /en-US/developers/addon/rlmt/versions/1692905    504    55827312    52.567    52.567    OriginCommError
2021-10-28    14:41:20    OTP50-C1    POST    /en-US/developers/addon/3f22e7b50ac64eb8a017/versions/1692907    504    46515967    49.096    49.096    OriginCommError

In the failed cases, Cloudfront returned 504 much sooner than 175 seconds (even with about the same sized files). Do you think it was possible that you didn't wait long enough for the failed cases?

Another difference is the edge location. Even though all the above requests were from the same IP, the successful ones were from SFO53-C1, whereas the failed ones were from OTP50-C1.

If the 2 successful requests were from you, do you remember what you did differently that might cause you to have a different edge server?

And lastly, as far as Nginx is concerned, all requests, even the failed ones, were completed successfully (e.g. Nginx received either 200 or 302 from the application).

bqbn avatar Oct 29 '21 00:10 bqbn

And just to report, I tried https://addons-dev.allizom.org/en-US/developers/addon/grammarly-for-dev/versions/1692796 with @AlexandraMoga's testing files. For the ~60Mb zip file, I got the Uncompressed size is too large error. For the ~54Mb zip file, I was able to successfully upload it. In my case the cloudfront logs look like this:

date    time    location    method    uri    status    request_bytes    time_taken    time_to_first_byte    x_edge_detailed_result_type
2021-10-28    23:37:13    SEA19-C1    POST    /en-US/developers/addon/grammarly-for-dev/versions/1692796    200    62034555    86.22    86.158    Miss
2021-10-28    23:40:25    SEA19-C1    POST    /en-US/developers/addon/grammarly-for-dev/versions/1692796    302    55827042    87.741    87.741    Miss

I'm using Chrome though. Please give that a try and see if it makes any difference.

bqbn avatar Oct 29 '21 00:10 bqbn

@bqbn these are the results of my new tests (Oct 29) with Chrome, on Windows (note that the UTC time mentioned below corresponds with the moment the error appeared in the browser):

  1. 13:06 UTC - https://addons-dev.allizom.org/en-US/developers/addon/django32/versions/1692875, with the ~60Mb file example => 504 error (I haven't done anything to interrupt the upload; I just waited for the result and the 504 error was triggered after a considerable amount of time - ~1 minute); I've checked the version page after the error was triggered and there was no source code uploaded;
  2. 13:14 UTC - https://addons-dev.allizom.org/en-US/developers/addon/temporary/versions/1692835, with the ~60Mb file example => 504 error, source code file not uploaded;
  3. 13:19 UTC - https://addons-dev.allizom.org/en-US/developers/addon/terms-of-service-didn-t-read/versions/1692600, with the ~60Mb file example => 504 error, source code file not uploaded;
  4. 13:22 UTC - https://addons-dev.allizom.org/en-US/developers/addon/mozilla-labs-snowl/versions/1692165 with the ~50Mb file example => 504 error, source code file was uploaded when I checked the version page again.

Also, to answer your questions:

@AlexandraMoga, in the cloudfront logs, I see 2 requests that had the same IP address as yours uploaded ~60 MB file successfully.

date    time    location    method    uri    status    request_bytes    time_taken    time_to_first_byte    x_edge_detailed_result_type
2021-10-28    07:01:49    SFO53-C1    POST    /en-US/developers/addon/unlistedexample-16-04/versions/1693053    200    62034917    176.422    176.366    Miss
2021-10-28    07:06:01    SFO53-C1    POST    /en-US/developers/addon/mv2-firefox-versions-3/versions/1692880    200    62035016    179.529    179.473    Miss

Those requests took over 175 seconds to finish, but nonetheless it succeeded.

I've made those uploads before noon (around 07:00 UTC according to the logs) with a different account - an admin user. I've tried to upload source code with that user again on Chrome but this time I ran into the 504 error. So, the only difference between the successful and the failed requests I can currently think about is the submission time (before noon/ afternoon).

In the failed cases, Cloudfront returned 504 much sooner than 175 seconds (even with about the same sized files). Do you think it was possible that you didn't wait long enough for the failed cases?

I didn't touch the page from the moment I've uploaded the zip file until the error was received, so I'm not sure how I could have impacted the response time.

Also, if it has any relevance, I wasn't using any active VPN software while running the above tests.

AlexandraMoga avatar Oct 29 '21 13:10 AlexandraMoga

I can see all your failed attempts on 10/29 were from the OTP50-C1 location.

So far, I think we'd agree that the problem is not browser or user related. And since we were able to upload from some edge locations but not another (i.e. OTP50-C1), I suspect this has to do with the edge servers. I'll file an AWS support ticket to see if they can help us find out why.

Meanwhile, I'd suggest we have more people help test from different locations so that I can collect logs that contain more different edge locations to send to AWS.

bqbn avatar Oct 29 '21 16:10 bqbn

AWS suggested that we increase the "Origin Response Time" to 60 seconds. We have done so in the dev environment. Please run some tests and see if that makes any difference.

bqbn avatar Nov 01 '21 17:11 bqbn

Retried testing on -dev (Nov 02):

  • 10:44 UTC - https://addons-dev.allizom.org/en-US/developers/addon/something-simple/versions/1692617 with 60Mb example => no 504 timeout; received the "Uncompressed size is too large" error;
  • 10:49 UTC - https://addons-dev.allizom.org/en-US/developers/addon/de-something-simple/versions/1692618 with 60Mb example => no 504 timeout; received the "Uncompressed size is too large" error;
  • 10:52 UTC - https://addons-dev.allizom.org/en-US/developers/addon/fix-listed-version-sync/versions/1692547 with 50Mb example => no 504 timeout; file successfully uploaded
  • 14:57 UTC - https://addons-dev.allizom.org/en-US/developers/addon/something-simple/versions/1692617 with 60Mb example => no 504 timeout; received the "Uncompressed size is too large" error;

@bqbn I couldn't reproduce the error today.

AlexandraMoga avatar Nov 02 '21 15:11 AlexandraMoga