openverse-api icon indicating copy to clipboard operation
openverse-api copied to clipboard

/healthcheck endpoint should check for Elasticsearch availability (original #487)

Open obulat opened this issue 4 years ago • 0 comments

This issue has been migrated from the CC Search API repository

Author: aldenstpage
Date: Wed May 06 2020
Labels: ✨ goal: improvement,🏷 status: label work required,🙅 status: discontinued

During deployments, our load balancer repeatedly polls the /healthcheck endpoint to check that the server is reachable. If this check succeeds, the newly deployed instance starts receiving production traffic. Right now, if Elasticsearch is not responsive, /healthcheck will still return 200 OK.

The healthcheck endpoint should check the health of the image index in Elasticsearch using the cluster health API. If it is unavailable, return error 500. Log an informative message explaining why the healthcheck failed.

Because the healthcheck endpoint may be called many times, and Elasticsearch calls are not free, we should cache the response of Elasticsearch for up to 10 seconds per call.

Original Comments:

madewithkode commented on Fri May 08 2020:

Hi Alden, this looks interesting, I'd love to work on it.

source

madewithkode commented on Fri May 08 2020:

Hi Alden in order to check the health of the image index in the /healthcheck view, I'm trying to use the urllib's urlopen() method to make a request to Elasticsearch's cluster API this way:

cluster_response = urlopen('http://0.0.0.0:8000/_cluster/health/image')

However, I keep getting a 404. Is there something I'm doing wrong? source

madewithkode commented on Fri May 08 2020:

Hi Alden in order to check the health of the image index in the /healthcheck view, I'm trying to use the urllib's urlopen() method to make a request to Elasticsearch's cluster API this way:

cluster_response = urlopen('http://0.0.0.0:8000/_cluster/health/image')

However, I keep getting a 404. Is there something I'm doing wrong?

Figured this, didn't know elastic search was running on a seperate host/port :) source

aldenstpage commented on Fri May 08 2020:

That's great!

It would be best to use the equivalent elasticsearch-py or elasticsearch-dsl query instead of making direct calls to the REST API (you can get an instance of the connection to Elasticsearch from search_controller.py). Here's an example for getting the cluster health; there ought to also be a way to narrow the query to the image index. source

madewithkode commented on Sat May 09 2020:

Alright...would look at the suggestion.

On Fri, May 8, 2020, 21:06 Alden S Page [email protected] wrote:

It would be best to use the equivalent elasticsearch-py query instead of making direct calls to the REST API. Here's https://discuss.elastic.co/t/how-to-get-cluster-health-using-python-api/25431 an example for getting the cluster health; there ought to also be a way to narrow the query to the image index.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/creativecommons/cccatalog-api/issues/487#issuecomment-625995563, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGFLMYA5WLAQPO5GYZNX5BTRQRQ5RANCNFSM4M2V5EKA .

source

madewithkode commented on Sat May 09 2020:

Hi Alden, I'm here again :) I'd love to ask, what status specifically signifies the availability of the image index. red, yellow or green ? or should i leave out this detail in the query since I'm using an already established connection instance from search_controller.py which waits for a yellow status by default. source

madewithkode commented on Sat May 09 2020:

Update:

I've successfully managed to query the health of the entire cluster, using the Elasticsearch connection instance gotten from search_controller.py. However when i try to limit the health check to just the image index, the request never resolves and continues to run forever with no response. And when i try to specify a timeout for the request, i get an "Illegal argument exception" even though timeout is a valid kwarg referenced in the API docs.

It'd be nice to point out that as at the time of writing, I'm yet to successfully run ./load_sample_data.sh so i don't know if this could be linked to the above problem.

source

madewithkode commented on Mon May 11 2020:

Hi Alden, Progress Report :)

Successfully got the load_sample_data.sh to run, and so far every other thing is working fine. I've also set up the 10s response caching on the /healthcheck view using redis and also the error logging.

However, I figured out the reason for the unresponsiveness when querying the elastic search image index was that it was non-existent and that the whole cluster index was empty too.

Do I need to do a manual population or something?

source

aldenstpage commented on Mon May 11 2020:

Hi again Onyenanu – if the index doesn't exist, the healthcheck should fail. This could happen in situations where we are switching Elasticsearch clusters in production and forgot to index data into the new one (or something went wrong while we were loading data into the new cluster).

In my experience, the ES Python libs can behave in unexpected ways that you sometimes have to work around. Since it seems like querying specifically for the image index health hangs when the index doesn't exist, perhaps you could query for healthchecks of every index in the cluster, and fail the healthcheck if image is not among them and green?

It sounds like it's coming along nicely! source

madewithkode commented on Tue May 12 2020:

Hi again Onyenanu – if the index doesn't exist, the healthcheck should fail. This could happen in situations where we are switching Elasticsearch clusters in production and forgot to index data into the new one (or something went wrong while we were loading data into the new cluster).

In my experience, the ES Python libs can behave in unexpected ways that you sometimes have to work around. Since it seems like querying specifically for the image index health hangs when the index doesn't exist, perhaps you could query for healthchecks of every index in the cluster, and fail the healthcheck if image is not among them and green?

It sounds like it's coming along nicely!

Hey Alden...Many thanks again for coming through with better insights. Suggestion sounds nice, would proceed with it.

And yes, the whole stuff is getting more interesting, learnt a handful in the few days :) source

obulat avatar Apr 21 '21 12:04 obulat