modelmesh-serving icon indicating copy to clipboard operation
modelmesh-serving copied to clipboard

Improve handling of namespace scoped in-memory resources in the Service Controller

Open tjohnson31415 opened this issue 1 year ago • 0 comments

When the controller is running in cluster-scoped mode, it watches all namespaces in the cluster. For each namespace that is enabled for ModelMesh (i.e. has the modelmesh-enabled: true label), the controller has to create:

  • a Kuberenetes Service
  • a gRPC client for the model mesh cluster created by the runtime pods
  • a watch on model mesh's etcd for events related to changes to objects internal to model mesh

The latter two things are in-memory objects in the controller. Currently, these objects are kept in sync with the cluster by updates triggered by watches on individual Namespace objects. If there are any issues with processing the events on the namespaces, the in-memory tracking might become out-of-sync with actual state of the namespaces in the cluster. It might be better to have a more "all-namespace" reconciliation that checks all current namespaces and ensures that the in-memory resources are completely in sync with the current state of the cluster.

This idea came out of discussion related to the go routine leak that was fixed in https://github.com/kserve/modelmesh-serving/pull/397. If it happens that there are further leaks/problems related to the in-memory tracking in the Service Controller, it might be worth pursuing a refactor that implements the "all-namespace" reconciliation described above.

tjohnson31415 avatar Sep 19 '23 20:09 tjohnson31415