addons icon indicating copy to clipboard operation
addons copied to clipboard

The versioncheck and services heartbeat should include checks for storage paths

Open bqbn opened this issue 2 years ago • 7 comments

Describe the problem and steps to reproduce it:

We got 500 error when trying to hit /__heartbeat__ for the services and versioncheck endpoint.

For example, on one of the services_web instances on the stage environment,

$ curl -i -s -H "host: services.addons.allizom.org" http://0.0.0.0:4000/__heartbeat__
HTTP/1.1 500 Internal Server Error
Content-Type: application/json
Expires: Fri, 26 May 2023 20:07:03 GMT
Cache-Control: max-age=0, no-cache, no-store, must-revalidate, private
X-AMO-Request-ID: 9291dc0d82854fd692e89bb5065d2f29
Content-Length: 260
Content-Security-Policy: img-src 'self' blob: data: https://addons.mozilla.org/static-server/ https://addons.mozilla.org/user-media/ https://addons.allizom.org/user-media/ https://addons.allizom.org/static-server/; connect-src 'self' https://*.google-analytics.com; script-src https://www.google-analytics.com/analytics.js https://www.googletagmanager.com/gtag/js https://www.recaptcha.net/recaptcha/ https://www.gstatic.com/recaptcha/ https://www.gstatic.cn/recaptcha/ https://addons.mozilla.org/static-server/ https://addons.allizom.org/static-server/; object-src 'none'; frame-src https://www.recaptcha.net/recaptcha/; child-src https://www.recaptcha.net/recaptcha/; style-src 'unsafe-inline' https://addons.mozilla.org/static-server/ https://addons.allizom.org/static-server/; media-src https://videos.cdn.mozilla.net; form-action 'self'; default-src 'none'; font-src 'self' https://addons.mozilla.org/static-server/ https://addons.allizom.org/static-server/; report-uri /__cspreport__
X-Frame-Options: DENY
X-Content-Type-Options: nosniff
Referrer-Policy: same-origin
Cross-Origin-Opener-Policy: same-origin
Vary: Accept-Encoding

{"memcache": {"state": true, "status": ""}, "libraries": {"state": true, "status": ""}, "elastic": {"state": true, "status": ""}, "path": {"state": false, "status": "check main status page for broken perms / values"}, "database": {"state": true, "status": ""}}

What happened?

When we move AMO to the GKE platform, Kubernetes will check __heartbeat__ for the pod readiness. Thus we need those two endpoints to be able to return 200 when it is ready to serve traffic.

What did you expect to happen?

For the services and versioncheck endpoints to return 200 when they're ready to serve traffic.

Anything else we should know?

n/a

┆Issue is synchronized with this Jira Task

bqbn avatar May 26 '23 20:05 bqbn

Is EFS mounted on services and versioncheck ? The function that is failing is checking permissions on various paths.

diox avatar May 26 '23 20:05 diox

Oh, it wasn't, and after mounting the NFS share, the __heartbeat__ works.

Is it possible for these two components to pass the health check without mounting the NFS share? They don't really need the share and currently in production (AWS) we don't mount it.

But the issue doesn't block GCP migration though. We'll mount the share for now.

bqbn avatar May 30 '23 15:05 bqbn

Is there an env variable or something other than the request URL I can use to detect we're on a services or versioncheck instance ?

diox avatar May 31 '23 08:05 diox

We can pass an env variable, such as AMO_COMPONENT or ADDONS_SERVER_COMPONENT to the app container to help it identify itself.

bqbn avatar Jun 27 '23 03:06 bqbn

Yes, that would be helpful to fix this.

diox avatar Jun 27 '23 08:06 diox

Old Jira Ticket: https://mozilla-hub.atlassian.net/browse/ADDSRV-376

KevinMind avatar May 03 '24 17:05 KevinMind

We should have ADDONS_SERVER_COMPONENT nowadays - so we could add a check to prevent adding the path monitor in front_heartbeat() if os.environ.get('ADDONS_SERVER_COMPONENT') in ('services-web', 'versioncheck-web').

diox avatar Sep 10 '24 10:09 diox