datamon
datamon copied to clipboard
GCS 502 auth error handling
the attached error has been observed out of cloud.google.com/go/storage
via storage/gcs
during bundle upload
.
like it says, we could be using a try-try-again mechanism on some call stacks that access GCS -- i.e. upon accessing the GCS api, whether it be through NewClient
, methods in the client's interface, or methods on the standard interfaces of the objects (e.g. readers) returned by the client's interface, we could parameterize a number of errors that might occur (1 or more) and a sleep timeout (30 seconds according to the attached error). if an error matching some specification (this 502 authorization error) occurs, sleep for the timeout, then try again, increasing a datamon-internal counter. repeat until no error occurs resetting the counter or until the parameterized number of errors is exceeded.
also, we could clean up the logs somewhat by finding out where in cloud.google.com/go/storage
or its internal deps the html output is printed and redirect it to the zap logger.
{"timestamp": "2019-08-13T19:57:39", "level": "INFO", "name": "util.os_utils", "message": "Uploading blob:afb2955b3d65aa9abefae189e610a06e57064893d9682efe5eb315c5e095194e2f52bf98b953e675525f6e5dd1d28a8e8af724
78cab99aa3c7aa8783c9abd046\n"}
write segment file: Post https://www.googleapis.com/upload/storage/v1/b/datamon-blob-data/o?alt=json&prettyPrint=false&projection=full&uploadType=multipart: oauth2: cannot fetch token: 502 Bad Gateway
Response: <!DOCTYPE html>
<html lang=en>
<meta charset=utf-8>
<meta name=viewport content="initial-scale=1, minimum-scale=1, width=device-width">
<title>Error 502 (Server Error)!!1</title>
<style>
*{margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{margin:7% auto 0;max-width:390px;min-height:180px;padding:30px 0 15px}* > body{background:u
rl(//www.google.com/images/errors/robot.png) 100% 5px no-repeat;padding-right:205px}p{margin:11px 0 22px;overflow:hidden}ins{color:#777;text-decoration:none}a img{border:0}@media screen and (max-width:772px){
body{background:none;margin-top:0;max-width:none;padding-right:0}}#logo{background:url(//www.google.com/images/branding/googlelogo/1x/googlelogo_color_150x54dp.png) no-repeat;margin-left:-5px}@media only scre
en and (min-resolution:192dpi){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat 0% 0%/100% 100%;-moz-border-image:url(//www.google.com/images/brandi
ng/googlelogo/2x/googlelogo_color_150x54dp.png) 0}}@media only screen and (-webkit-min-device-pixel-ratio:2){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png)
no-repeat;-webkit-background-size:100% 100%}}#logo{display:inline-block;height:54px;width:150px}
</style>
<a href=//www.google.com/><span id=logo aria-label=Google></span></a>
<p><b>502.</b> <ins>That’s an error.</ins>
<p>The server encountered a temporary error and could not complete your request.<p>Please try again in 30 seconds. <ins>That’s all we know.</ins>
{"timestamp": "2019-08-13T19:57:39", "level": "INFO", "name": "util.os_utils", "message": "Uploading blob:f88143393e2354b9a1c09b920287bfeb0fbea80dd00cc45656f378cc0b160cc75836bf5d8e65a552494bc58e130a2b6be69e06
cffc35e4060b9dceb9a5c52745\n"}
{"timestamp": "2019-08-13T19:57:39", "level": "INFO", "name": "util.os_utils", "message": "Bundle upload failed. Failed to upload file gfs.20190807_18_gfs.t18z.pgrb2.0p50.f183 err: write segment file: Post https://www.googleapis.com/upload/storage/v1/b/datamon-blob-data/o?alt=json&prettyPrint=false&projection=full&uploadType=multipart: oauth2: cannot fetch token: 502 Bad Gateway\n"}
{"timestamp": "2019-08-13T19:57:39", "level": "INFO", "name": "util.os_utils", "message": "Response: <!DOCTYPE html>\n"}
{"timestamp": "2019-08-13T19:57:39", "level": "INFO", "name": "util.os_utils", "message": "<html lang=en>\n"}
{"timestamp": "2019-08-13T19:57:39", "level": "INFO", "name": "util.os_utils", "message": " <meta charset=utf-8>\n"}
{"timestamp": "2019-08-13T19:57:39", "level": "INFO", "name": "util.os_utils", "message": " <meta name=viewport content=\"initial-scale=1, minimum-scale=1, width=device-width\">\n"}
{"timestamp": "2019-08-13T19:57:39", "level": "INFO", "name": "util.os_utils", "message": " <title>Error 502 (Server Error)!!1</title>\n"}
{"timestamp": "2019-08-13T19:57:39", "level": "INFO", "name": "util.os_utils", "message": " <style>\n"}
{"timestamp": "2019-08-13T19:57:39", "level": "INFO", "name": "util.os_utils", "message": " *{margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{margin:7% auto 0;max-width:390px;min-height:180px;padding:30px 0 15px}* > body{background:url(//www.google.com/images/errors/robot.png) 100% 5px no-repeat;padding-right:205px}p{margin:11px 0 22px;overflow:hidden}ins{color:#777;text-decoration:none}a img{border:0}@media screen and (max-width:772px){body{background:none;margin-top:0;max-width:none;padding-right:0}}#logo{background:url(//www.google.com/images/branding/googlelogo/1x/googlelogo_color_150x54dp.png) no-repeat;margin-left:-5px}@media only screen and (min-resolution:192dpi){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat 0% 0%/100% 100%;-moz-border-image:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) 0}}@media only screen and (-webkit-min-device-pixel-ratio:2){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat;-webkit-background-size:100% 100%}}#logo{display:inline-block;height:54px;width:150px}\n"}
{"timestamp": "2019-08-13T19:57:39", "level": "INFO", "name": "util.os_utils", "message": " </style>\n"}
{"timestamp": "2019-08-13T19:57:39", "level": "INFO", "name": "util.os_utils", "message": " <a href=//www.google.com/><span id=logo aria-label=Google></span></a>\n"}
{"timestamp": "2019-08-13T19:57:39", "level": "INFO", "name": "util.os_utils", "message": " <p><b>502.</b> <ins>That\u2019s an error.</ins>\n"}
{"timestamp": "2019-08-13T19:57:39", "level": "INFO", "name": "util.os_utils", "message": " <p>The server encountered a temporary error and could not complete your request.<p>Please try again in 30 seconds. <ins>That\u2019s all we know.</ins>\n"}
{"timestamp": "2019-08-13T19:57:39", "level": "INFO", "name": "util.os_utils", "message": "Subprocess returned nonzero exit status", "return_code": 1, "command": "datamon bundle upload --repo gfs-3h-coastal-grib --path /var/lib/oneconcern_tmp/gfs_3h/datamon_tmp --message Uploaded from the flood modeling pipeline --label 2019080718.183"}
503 (gg unavailable) errors could also be handled better