modelmesh
modelmesh copied to clipboard
feat: Add kv-store connection check to readiness probe
Motivation
When the modelmesh is not able to connect to the kv store to update its instance recording or sync with the other instances it can not reliably serve inference requests. For a short time a disconnect can be tolerated and the cached values can be used to serve requests. However after some time the data may be stale and the routing of requests may result in errors. For example with instance A and B, if A has connection issues while B leaves the mesh, A still have the outdated instance record and will still route inference requests to B, which fail. To prevent this, A should be marked unready if it can not connect to the kv store, to inform upstream proxies to not route traffic to A.
Modifications
Add existing verifyKvStoreConnection
check to isReady
check.
Result
isReady
will return false
when the model mesh instance lost connection to the kv store. Allowing systems such as kubernetes to react to this condition.