magda
magda copied to clipboard
We need to re-look at liveness / readiness Probs logic of our pods
We need to re-look at liveness/readiness Probs logic of our pods
We have done some work around liveness/readiness Probs for zero-downtime deployment here: #1471
We need to re-look at it to make sure that, during k8s rolling update (particularly for registry API), if DB is not accessible, will liveness/readiness Probs reports this correctly.
Registry seems not checking anything with database at all and just simply reply OK (for both liveness & readiness): https://github.com/magda-io/magda/blob/f52fcc43380070fb71a06ce5e11427d1aa1412b2/magda-registry-api/src/main/scala/au/csiro/data61/magda/registry/Api.scala#L139
:feature:
At a minimum we should resolve this for these ones that utilise DB, namely authorization-api, content-api, registry-api
correspondence utilises this well for a smtp dep (instead of db dep for the above)
https://github.com/magda-io/magda/blob/master/magda-correspondence-api/src/createApiRouter.ts#L56-L67 https://github.com/magda-io/magda/blob/master/magda-correspondence-api/src/test/createApiRouter.spec.ts#L113-L141
turns out authorization & content are OK registry PR at https://github.com/magda-io/magda/pull/2997
Same problem in storage-api
https://github.com/magda-io/magda/blob/v0.0.58-rc.3/magda-storage-api/src/createApiRouter.ts#L31-L39
Add #3024 as the blocker as, currently, there is a performance bottleneck that only registry-full
pod serve the /api/v0/registry
endpoint and we can't scale it up.
Our UI always sends read requests to /api/v0/registry-read-only
but we can't guarantee that third-party software will do the same --- especially, for metadata crawlers.
Querying DB in readiness probe may add extra burden to it when registry-full
is already on full speed.