seldon-core Metrics Endpoint dose not work with Istio Sidecar

Describe the bug

As described in the doc

Note: we do not use prometheus.io/port annotation in this configuration.

But Istio Sidecar will overwrite the annotations which related to Prometheus and the metrics for the business service will be collected internally at Sidecar.

Because of the lack of prometheus/port, Sidecar defaults to http://127.0.0.1:80/prometheus to collect metrics.

Environment

Cloud Provider:

AlibabaCloud ACK

Kubernetes Cluster Version [Output of kubectl version]

Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.3", GitCommit:"1e11e4a2108024935ecfcb2912226cedeafd99df", GitTreeState:"clean", BuildDate:"2020-10-14T12:50:19Z", GoVersion:"go1.15.2", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"16+", GitVersion:"v1.16.9-aliyun.1", GitCommit:"4f7ea78", GitTreeState:"", BuildDate:"2020-05-08T07:29:59Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}

Deployed Seldon System Images: [Output of kubectl get --namespace seldon-system deploy seldon-controller-manager -o yaml | grep seldonio]

Alternatively run echo "#### Kubernetes version:\n $(kubectl version) \n\n#### Seldon Images:\n$(kubectl get --namespace seldon-system deploy seldon-controller-manager -o yaml | grep seldonio)"

 value: docker.io/seldonio/engine:1.5.0
 value: docker.io/seldonio/seldon-core-executor:1.5.0
 image: docker.io/seldonio/seldon-core-operator:1.5.0

Istio version

1.8.0

Sidecar Log

2020-11-30T13:38:37.446855Z	error	failed scraping application metrics: error scraping http://localhost:80/prometheus: Get "http://localhost:80/prometheus": dial tcp 127.0.0.1:80: connect: connection refused
2020-11-30T13:38:52.446774Z	error	failed scraping application metrics: error scraping http://localhost:80/prometheus: Get "http://localhost:80/prometheus": dial tcp 127.0.0.1:80: connect: connection refused
2020-11-30T13:39:07.446908Z	error	failed scraping application metrics: error scraping http://localhost:80/prometheus: Get "http://localhost:80/prometheus": dial tcp 127.0.0.1:80: connect: connection refused

Nov 30 '20 13:11 domechn

So the problem is that we're requiring the scrape path to be /prometheus and the istio sidecar doesn't use that.

We normally suggest to disable istio sidecars by using

spec:
  predictors:
  - annotations:
      sidecar.istio.io/inject: "false"

Or do you want to be able to use the sidecar? Is it only the sidecar that fails to scrape? Does the scraping work for the main model?

Dec 01 '20 16:12 ryandawsonuk

Or do you want to be able to use the sidecar?

Yes, I hope to use sidecar for traffic management and other tasks

Is it only the sidecar that fails to scrape? Does the scraping work for the main model?

I can’t guarantee, but when I ran the example provided by seldon, I found no other problems

Dec 03 '20 03:12 domechn

So we need to update the sidecar to allow the correct Prometheus scrape path? Is it an http/https issue?

Dec 03 '20 10:12 ukclivecox

I think it is an issue of Operator

Maybe just need to set the Annotation prometheus.io/port when it creates the Deployment

https://github.com/SeldonIO/seldon-core/blob/master/operator/controllers/seldondeployment_engine.go#L496

Dec 03 '20 11:12 domechn

@domgoer and @cliveseldon - I can confirm that @domgoer's suggestion is correct.

Previously, we had the following error:

2020-12-07T17:28:21.670730Z	error	failed scraping application metrics: error scraping http://localhost:80/prometheus: Get "http://localhost:80/prometheus": dial tcp 127.0.0.1:80: connect: connection refused

Now we have zero logs of this value. Istio automatically merges all metrics on its own, so this will resolve this issue.

i.e., you only need to uncomment the "prometheus.io/port" line, and it's solved.

Dec 07 '20 17:12 ericandrewmeadows

Closing. Please reopen if still an issue.

Mar 25 '21 07:03 ukclivecox

It looks this issue it not solved yet.

For anyone else struggling with it, it looks that setting the two following envs in the operator's pod helps:

EXECUTOR_SERVER_PORT: '80'
ENGINE_SERVER_PORT: '80'

The first causes the executor to listen on port 80, so the metrics are available at the path Istio expects (without need for the annotation - that is the main issue here). The second env points Istio's VirtualServer into right port.

This doesn't solve the issue completely - as the annotation would do - but at least allows the metrics to be scraped without recompilation.

If you're using helm chart, these are the overrides you need to set: --set executor.user=0 --set executor.port=80 --set engine.port=80.

Mar 29 '21 06:03 szczeles

It looks this issue it not solved yet.

For anyone else struggling with it, it looks that setting the two following envs in the operator's pod helps:
EXECUTOR_SERVER_PORT: '80'
ENGINE_SERVER_PORT: '80'
The first causes the executor to listen on port 80, so the metrics are available at the path Istio expects (without need for the annotation - that is the main issue here). The second env points Istio's VirtualServer into right port.

This doesn't solve the issue completely - as the annotation would do - but at least allows the metrics to be scraped without recompilation.

If you're using helm chart, these are the overrides you need to set: --set executor.user=0 --set executor.port=80 --set engine.port=80.

Thanks mate. And if I installed with Istioctl?

istioctl manifest apply --set meshConfig.enablePrometheusMerge=true --set executor.user=0 --set executor.port=80 --set engine.port=80

didn't work..

Apr 27 '21 17:04 Shahard2

@Shahard2 I don't know for sure, but I think istioctl is used to deploy istio itself. These three params should be passed to seldon setup (no need to touch istio) via helm install/helm upgrade

Apr 29 '21 14:04 szczeles

The issue looks like is not yet solved. @Shahard2 solution worked albeit it implies a few issues with Istio overall health, in my case at least. What it really solved was adding the annotation to my SeldonDeployment:

...
spec:
  name: my-model
  ....
  predictors:
  - annotations:
      prometheus.io/port: '8000'

This doesn't affect the overall operator cooperation with Istio.

I work with seldon-core:1.13.1.

So I believe this solution would be the recommended one.

@cliveseldon might re-opening the issue be beneficial in your opinion?

Mar 30 '22 17:03 spi-x-i

Another alternative is to not use metrics merging - this allows the use of Istio sidecars, but also allows all metrics to be exposed as if the sidecar weren't there:

prometheus.istio.io/merge-metrics: "false"

The main issue is that Istio's metrics merging assumes a single application endpoint to be exposed for metrics, because that's what the Prometheus annotations approach expects. For pods with multiple containers producing metrics (e.g. a Seldon executor and a Seldon microservice or MLServer), this assumption breaks.

Metrics merging docs for reference.

Mar 30 '22 19:03 agrski

seldon-core seldon-core copied to clipboard

Metrics Endpoint dose not work with Istio Sidecar

Describe the bug

Environment

Sidecar Log

seldon-core
seldon-core copied to clipboard