dataverse-kubernetes
dataverse-kubernetes copied to clipboard
Expose Solr readiness/liveness, performance metrics and more to K8s/Prometheus
This is related to #82 and #85.
Very profund and good article with loads of usable stuff: https://lucidworks.com/post/running-solr-on-kubernetes-part-1/
Documenting special handling of Solr in 729a0033 here.
Without any readinessProbe
or livenessProbe
, the old Solr container will be killed immediately after we start a new one to replace it (e.g. during update). The few seconds between terminating the one old container and core loading on the new instance was sufficient to release the IndexWriter lock (/data/index/write.lock
)
Now as we have probes, the old container will not be killed before the new is not ready to serve requests (checked by getting system info). (We cannot ping the core, as it would block due to the lock). The time between "ready" and termination is in turn too large to have the lock released on time for core loading.
Here the livenessProbe
kicks in: check will fail, and the container is restarted because of failure treshold set to 1. This restarts the container and that has been enough time to release the lock.
I dunno if a RELOAD would be ok, too - I wanted to make sure that it reaches a workable state again. This might bite back someday... Maybe switch to SolrCloud anyway.
For the Prometheus exporter: https://lucene.apache.org/solr/guide/7_6/monitoring-solr-with-prometheus-and-grafana.html
Not sure if this should run in a sidecar or be a separate deployment.