modelmesh-serving
modelmesh-serving copied to clipboard
Support out of distribution detection metrics
If an OOD enabled model is deployed, model mesh metrics should capture the two additional metrics that these models generate as part of the inferencing metrics.
OOD enabled model will produce in a single output tensor:
- Original model inferencing output
- OOD score
We would need output transformation to separate 1 from 2 and logging to record the input/output and OOD scores (e.g., in OpenShift logging and/or Prometheus). These are generic functionalities that should be useful for things beyond OOD.
Hi, @mudhakar @taneem-ibrahim , just to add more details on OOD (model certainty) enablement and deployment.
- To get an certainty enabled model, the certainty container will take in the original model and a in-distribution (normal) dataset, then the output is a modified model which will be stored at user-specified location. The modified model is capable of generating the original model inference output and a certainty score.
- The modified model can be deployed just like a regular model (right diagram). To take advantage of the certainty score, an output transformation can be used to extract the score; then the certainty score can be logged into Prometheus or other relevant services for model monitoring, dashboarding, etc.
Per a discussion with @njhill and @ckadner, the best path forward is to have an output transformer (similar to post-processing transformer in k-serve) native to model-mesh, without requiring k-serve controller.
@nirmdesai @mudhakar After further discussion with @njhill and @ckadner , sounds like our fastest way to get integrated would be to add a custom post processor as part of OOD for now until we have kserve-raw or serverless available in ODH.
A proposal for the post-processing transform

Thanks @daw3rd. @njhill , @taneem-ibrahim , @ckadner: The above "KServe Proxy" is the custom post-processor container you proposed last week. Could you please review and confirm this is what you had in mind? cc: @mudhakar
Hi @nirmdesai Is the kserve proxy (rest server) here replicating functionality similar to this?
@taneem-ibrahim: Just to be precise, we are not going to use K-Serve transformer framework (shown in the link you shared) in implementing K-Serve Proxy. However, the implementation of our K-Serve Proxy will look similar to a typical pre-/post processor function shown in the example above. Also, the deployment flow will be different from the link you shared wherein the transformer is deployed along with InferenceService creation. In our case, you would first create an InferenceService as you would normally, and on top of that deploy the proxy container. Then you would use the Proxy APIs for inferences instead of using the InferenceService APIs for inferencing. cc: @mudhakar , @daw3rd , @spacew
Hello @taneem-ibrahim @nirmdesai @mudhakar @spacew @daw3rd cc: @njhill @ckadner
In regards to a proxy service for transforming model output for a certainty-enabled model, below is a diagram demonstrating the interaction for a modelmesh proxy server deployed on the openshift to the same cluster as where RHODS is hosted. Note that in the deployment, we also deploy a Prometheus service for logging the model-certainty metrics overtime as generated by the modelmesh proxy service - both are packaged via helm install, however, if a Prometheus instance already exists, this can be removed.
Please share feedback or comments on the deployment and sequence steps, as well as the endpoint for reaching the modelmesh proxy.
Copying discussions I've had on Slack:
I think TrustyAI can provide a lot of the capabilities that the modelmesh-proxy is aiming to provide, which would provide the advantage of not needing to add another component into the mix
TrustyAI within ODH/RHODS is a service that intercepts modelmesh inputs and output payloads and then sends metrics computed on that input/output data (e.g., fairness metrics) to Prometheus. If we defined a metric that simply grabbed the certainty scores from the model output payload and emitted them to Prometheus as a metric, it'd be a really simple way of doing what you're trying to do.
As a PoC, I've done exactly that and got an OOD model deployed in modelmesh and sending the OOD metrics to Prometheus within OpenDataHub: