seldon-core icon indicating copy to clipboard operation
seldon-core copied to clipboard

Metrics Endpoint dose not work with Istio Sidecar

Open domechn opened this issue 4 years ago • 11 comments

Describe the bug

As described in the doc

Note: we do not use prometheus.io/port annotation in this configuration.

But Istio Sidecar will overwrite the annotations which related to Prometheus and the metrics for the business service will be collected internally at Sidecar.

Because of the lack of prometheus/port, Sidecar defaults to http://127.0.0.1:80/prometheus to collect metrics.

Environment

  • Cloud Provider:
AlibabaCloud ACK
  • Kubernetes Cluster Version [Output of kubectl version]
Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.3", GitCommit:"1e11e4a2108024935ecfcb2912226cedeafd99df", GitTreeState:"clean", BuildDate:"2020-10-14T12:50:19Z", GoVersion:"go1.15.2", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"16+", GitVersion:"v1.16.9-aliyun.1", GitCommit:"4f7ea78", GitTreeState:"", BuildDate:"2020-05-08T07:29:59Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
  • Deployed Seldon System Images: [Output of kubectl get --namespace seldon-system deploy seldon-controller-manager -o yaml | grep seldonio]

Alternatively run echo "#### Kubernetes version:\n $(kubectl version) \n\n#### Seldon Images:\n$(kubectl get --namespace seldon-system deploy seldon-controller-manager -o yaml | grep seldonio)"

 value: docker.io/seldonio/engine:1.5.0
 value: docker.io/seldonio/seldon-core-executor:1.5.0
 image: docker.io/seldonio/seldon-core-operator:1.5.0
  • Istio version
1.8.0

Sidecar Log

2020-11-30T13:38:37.446855Z	error	failed scraping application metrics: error scraping http://localhost:80/prometheus: Get "http://localhost:80/prometheus": dial tcp 127.0.0.1:80: connect: connection refused
2020-11-30T13:38:52.446774Z	error	failed scraping application metrics: error scraping http://localhost:80/prometheus: Get "http://localhost:80/prometheus": dial tcp 127.0.0.1:80: connect: connection refused
2020-11-30T13:39:07.446908Z	error	failed scraping application metrics: error scraping http://localhost:80/prometheus: Get "http://localhost:80/prometheus": dial tcp 127.0.0.1:80: connect: connection refused

domechn avatar Nov 30 '20 13:11 domechn

So the problem is that we're requiring the scrape path to be /prometheus and the istio sidecar doesn't use that.

We normally suggest to disable istio sidecars by using

spec:
  predictors:
  - annotations:
      sidecar.istio.io/inject: "false"

Or do you want to be able to use the sidecar? Is it only the sidecar that fails to scrape? Does the scraping work for the main model?

ryandawsonuk avatar Dec 01 '20 16:12 ryandawsonuk

Or do you want to be able to use the sidecar?

Yes, I hope to use sidecar for traffic management and other tasks

Is it only the sidecar that fails to scrape? Does the scraping work for the main model?

I can’t guarantee, but when I ran the example provided by seldon, I found no other problems

domechn avatar Dec 03 '20 03:12 domechn

So we need to update the sidecar to allow the correct Prometheus scrape path? Is it an http/https issue?

ukclivecox avatar Dec 03 '20 10:12 ukclivecox

I think it is an issue of Operator

Maybe just need to set the Annotation prometheus.io/port when it creates the Deployment

https://github.com/SeldonIO/seldon-core/blob/master/operator/controllers/seldondeployment_engine.go#L496

domechn avatar Dec 03 '20 11:12 domechn

@domgoer and @cliveseldon - I can confirm that @domgoer's suggestion is correct.

Previously, we had the following error:

2020-12-07T17:28:21.670730Z	error	failed scraping application metrics: error scraping http://localhost:80/prometheus: Get "http://localhost:80/prometheus": dial tcp 127.0.0.1:80: connect: connection refused

Now we have zero logs of this value. Istio automatically merges all metrics on its own, so this will resolve this issue.

i.e., you only need to uncomment the "prometheus.io/port" line, and it's solved.

ericandrewmeadows avatar Dec 07 '20 17:12 ericandrewmeadows

Closing. Please reopen if still an issue.

ukclivecox avatar Mar 25 '21 07:03 ukclivecox

It looks this issue it not solved yet.

For anyone else struggling with it, it looks that setting the two following envs in the operator's pod helps:

EXECUTOR_SERVER_PORT: '80'
ENGINE_SERVER_PORT: '80'

The first causes the executor to listen on port 80, so the metrics are available at the path Istio expects (without need for the annotation - that is the main issue here). The second env points Istio's VirtualServer into right port.

This doesn't solve the issue completely - as the annotation would do - but at least allows the metrics to be scraped without recompilation.

If you're using helm chart, these are the overrides you need to set: --set executor.user=0 --set executor.port=80 --set engine.port=80.

szczeles avatar Mar 29 '21 06:03 szczeles

It looks this issue it not solved yet.

For anyone else struggling with it, it looks that setting the two following envs in the operator's pod helps:

EXECUTOR_SERVER_PORT: '80'
ENGINE_SERVER_PORT: '80'

The first causes the executor to listen on port 80, so the metrics are available at the path Istio expects (without need for the annotation - that is the main issue here). The second env points Istio's VirtualServer into right port.

This doesn't solve the issue completely - as the annotation would do - but at least allows the metrics to be scraped without recompilation.

If you're using helm chart, these are the overrides you need to set: --set executor.user=0 --set executor.port=80 --set engine.port=80.

Thanks mate. And if I installed with Istioctl?

istioctl manifest apply --set meshConfig.enablePrometheusMerge=true --set executor.user=0 --set executor.port=80 --set engine.port=80

didn't work..

Shahard2 avatar Apr 27 '21 17:04 Shahard2

@Shahard2 I don't know for sure, but I think istioctl is used to deploy istio itself. These three params should be passed to seldon setup (no need to touch istio) via helm install/helm upgrade

szczeles avatar Apr 29 '21 14:04 szczeles

The issue looks like is not yet solved. @Shahard2 solution worked albeit it implies a few issues with Istio overall health, in my case at least. What it really solved was adding the annotation to my SeldonDeployment:

...
spec:
  name: my-model
  ....
  predictors:
  - annotations:
      prometheus.io/port: '8000'

This doesn't affect the overall operator cooperation with Istio.

I work with seldon-core:1.13.1.

So I believe this solution would be the recommended one.

@cliveseldon might re-opening the issue be beneficial in your opinion?

spi-x-i avatar Mar 30 '22 17:03 spi-x-i

Another alternative is to not use metrics merging - this allows the use of Istio sidecars, but also allows all metrics to be exposed as if the sidecar weren't there:

prometheus.istio.io/merge-metrics: "false"

The main issue is that Istio's metrics merging assumes a single application endpoint to be exposed for metrics, because that's what the Prometheus annotations approach expects. For pods with multiple containers producing metrics (e.g. a Seldon executor and a Seldon microservice or MLServer), this assumption breaks.

Metrics merging docs for reference.

agrski avatar Mar 30 '22 19:03 agrski