modelmesh icon indicating copy to clipboard operation
modelmesh copied to clipboard

feat: Add kv-store connection check to readiness probe

Open Legion2 opened this issue 3 months ago • 3 comments

Motivation

When the modelmesh is not able to connect to the kv store to update its instance recording or sync with the other instances it can not reliably serve inference requests. For a short time a disconnect can be tolerated and the cached values can be used to serve requests. However after some time the data may be stale and the routing of requests may result in errors. For example with instance A and B, if A has connection issues while B leaves the mesh, A still have the outdated instance record and will still route inference requests to B, which fail. To prevent this, A should be marked unready if it can not connect to the kv store, to inform upstream proxies to not route traffic to A.

Modifications

Add existing verifyKvStoreConnection check to isReady check.

Result

isReady will return false when the model mesh instance lost connection to the kv store. Allowing systems such as kubernetes to react to this condition.

Legion2 avatar Mar 19 '24 13:03 Legion2