magda icon indicating copy to clipboard operation
magda copied to clipboard

We need to re-look at liveness / readiness Probs logic of our pods

Open t83714 opened this issue 4 years ago • 7 comments

We need to re-look at liveness/readiness Probs logic of our pods

We have done some work around liveness/readiness Probs for zero-downtime deployment here: #1471

We need to re-look at it to make sure that, during k8s rolling update (particularly for registry API), if DB is not accessible, will liveness/readiness Probs reports this correctly.

t83714 avatar Oct 02 '20 04:10 t83714

Registry seems not checking anything with database at all and just simply reply OK (for both liveness & readiness): https://github.com/magda-io/magda/blob/f52fcc43380070fb71a06ce5e11427d1aa1412b2/magda-registry-api/src/main/scala/au/csiro/data61/magda/registry/Api.scala#L139

t83714 avatar Oct 02 '20 04:10 t83714

:feature:

soyarsauce avatar Oct 02 '20 04:10 soyarsauce

At a minimum we should resolve this for these ones that utilise DB, namely authorization-api, content-api, registry-api

soyarsauce avatar Oct 02 '20 04:10 soyarsauce

correspondence utilises this well for a smtp dep (instead of db dep for the above)

https://github.com/magda-io/magda/blob/master/magda-correspondence-api/src/createApiRouter.ts#L56-L67 https://github.com/magda-io/magda/blob/master/magda-correspondence-api/src/test/createApiRouter.spec.ts#L113-L141

soyarsauce avatar Oct 02 '20 04:10 soyarsauce

turns out authorization & content are OK registry PR at https://github.com/magda-io/magda/pull/2997

soyarsauce avatar Oct 07 '20 00:10 soyarsauce

Same problem in storage-api

https://github.com/magda-io/magda/blob/v0.0.58-rc.3/magda-storage-api/src/createApiRouter.ts#L31-L39

soyarsauce avatar Oct 27 '20 07:10 soyarsauce

Add #3024 as the blocker as, currently, there is a performance bottleneck that only registry-full pod serve the /api/v0/registry endpoint and we can't scale it up.

Our UI always sends read requests to /api/v0/registry-read-only but we can't guarantee that third-party software will do the same --- especially, for metadata crawlers.

Querying DB in readiness probe may add extra burden to it when registry-full is already on full speed.

t83714 avatar Nov 03 '20 00:11 t83714