mason icon indicating copy to clipboard operation
mason copied to clipboard

Network errors impacting mason downloads

Open springmeyer opened this issue 7 years ago • 16 comments

I feel like I've been seen an increased amount of network failures when fetching binaries from s3 in the last month +. This ticket stands to track these to start assembling a fuller picture of the failures and see if there is a pattern.

springmeyer avatar Nov 20 '17 18:11 springmeyer

Failed to download https://mason-binaries.s3.amazonaws.com/osx-x86_64/android-ndk/arm-9-r13b.tar.gz (returncode: 56) on OS X travis build: https://travis-ci.org/mapbox/mason/jobs/304807664#L1357

springmeyer avatar Nov 20 '17 18:11 springmeyer

/cc @mapsam who mentioned seeing multiple/repeated clang++ download failures. @mapsam was this on OS X or within docker?

springmeyer avatar Nov 20 '17 18:11 springmeyer

@springmeyer I was on OSX and saw hangs with clang++ when using the following curl command:

curl -sSfL https://s3.amazonaws.com/mason-binaries/osx-x86_64/clang++/5.0.0.tar.gz | tar --gunzip --extract --strip-components=1

The connection to the file is made relatively quick, but the 30MB download takes much longer than other 30MB files.

mapsam avatar Nov 20 '17 19:11 mapsam

@mapsam, okay thanks for the details. After https://github.com/mapbox/mason/commit/1727795f314dbef66fb0f84ee98a82a62e77b5d1 mason will now output the exact returncode on error. This is what is producing the:

(returncode: 56)

Above in the error I saw in @artemp's commit where the android SDK failed to download. Let's keep an eye on whether we always see 56 (CURLE_RECV_ERROR) or whether we see other errors reported by curl.

springmeyer avatar Nov 20 '17 19:11 springmeyer

Not an s3 issue, but noting nonetheless that I also just hit this on an OS X travis job:

$ ./mason build ${MASON_NAME} ${MASON_VERSION}
Cloning into '/Users/travis/build/mapbox/mason/mason_packages/.build/mapnik-vf02a25901'...
error: RPC failed; curl 56 SSLRead() return error -36
fatal: The remote end hung up unexpectedly
fatal: early EOF
fatal: index-pack failed

Which looks like a git clone failing in curl, also with 56 as error.

springmeyer avatar Nov 20 '17 19:11 springmeyer

Now seeing:

* Downloading binary package https://mason-binaries.s3.amazonaws.com/linux-x86_64/sqlite/3.8.8.1.tar.gz
Failed to download https://mason-binaries.s3.amazonaws.com/linux-x86_64/sqlite/3.8.8.1.tar.gz (returncode: 35)

https://travis-ci.org/mapbox/mason/jobs/305567441#L495

springmeyer avatar Nov 22 '17 01:11 springmeyer

ugh, also just hit:

oci runtime error: exec failed: container_linux.go:265: starting container process caused "could not create session key: disk quota exceeded"

https://travis-ci.org/mapbox/mason/jobs/305567199

springmeyer avatar Nov 22 '17 01:11 springmeyer

hrm:

./scripts/clang-format.sh
Downloading https://s3.amazonaws.com/mason-binaries/linux-x86_64/clang++/5.0.0.tar.gz
curl: (22) The requested URL returned error: 429 Too Many Requests
gzip: stdin: unexpected end of file
tar: Child returned status 1
tar: Error is not recoverable: exiting now
make: *** [format] Error 2

https://travis-ci.org/mapbox/node-cpp-skel/jobs/305550318#L552

springmeyer avatar Nov 22 '17 03:11 springmeyer

/cc @rclark who I've spoken with about this a few weeks ago. @rclark - s3 downloads from the mason bucket appear to be degrading and the problem is worsening. Any ideas of things to test or try to get to the bottom of why this is happening?

springmeyer avatar Nov 22 '17 03:11 springmeyer

Do you have any way to observe the S3 connections or S3 errors more directly? All the error codes you've got here appear to be from downstream applications that are perhaps reacting to S3 networking failures. But even the 429 isn't an S3 response code -- they give you a 503 if they want you to SlowDown.

rclark avatar Nov 22 '17 15:11 rclark

We are using curl on the command line to download the binary .tar.gz files from s3: https://github.com/mapbox/mason/blob/2602c302fd17d70fcef3f2fe90482d0e6232fdb8/mason.sh#L533-L544.

In https://github.com/mapbox/mason/commit/1727795f314dbef66fb0f84ee98a82a62e77b5d1 I modified things to actually try to print the http error code.

But even the 429 isn't an S3 response code -- they give you a 503 if they want you to SlowDown.

That one (The requested URL returned error: 429 Too Many Requests) struck me as well - that looks to be coming from the curl code itself rather than the bash output logic I added.

springmeyer avatar Nov 22 '17 15:11 springmeyer

I think I'd have to take it to AWS support. You might try to check for x-amz headers in the HTTP response to see if S3 is trying to tell you anything there.

rclark avatar Nov 22 '17 16:11 rclark

Thanks @rclark - signing off for the holiday now. I will add -v to dump the headers next time I see persistent errors.

springmeyer avatar Nov 22 '17 17:11 springmeyer

another one, which looks only related to travis network since the upstream is not coming from AWS. I probably won't post more of this kind to avoid being too noisy on this ticket, but will post this one since I've not seen it before:

* Downloading http://nongnu.askapache.com/freetype/freetype-2.5.5.tar.bz2...
curl: (56) Recv failure: Connection reset by peer
Failed to download http://nongnu.askapache.com/freetype/freetype-2.5.5.tar.bz2 (returncode: 56)

https://travis-ci.org/mapbox/mason/jobs/308033127#L1784

springmeyer avatar Nov 30 '17 17:11 springmeyer

CMake Error at cmake/mason.cmake:103 (message):
  [Mason] Failed to download
  https://mason-binaries.s3.amazonaws.com/headers/rapidjson/1.1.0.tar.gz:
  curl: (35) gnutls_handshake() failed: Error in the pull function.

https://circleci.com/gh/mapbox/mapbox-gl-native/88893?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link

springmeyer avatar Apr 04 '18 00:04 springmeyer

Error message in Travis when trying to download recently published LLVM 6.0.0 binaries:

Failed to download https://mason-binaries.s3.amazonaws.com/linux-x86_64/android-ndk/arm-14-r16b.tar.gz (returncode: 141)

Note: (returncode: 141)

sssoleileraaa avatar Apr 24 '18 22:04 sssoleileraaa