seldon-core
seldon-core copied to clipboard
Metrics Endpoint dose not work with Istio Sidecar
Describe the bug
As described in the doc
Note: we do not use prometheus.io/port annotation in this configuration.
But Istio Sidecar will overwrite the annotations
which related to Prometheus
and the metrics for the business service will be collected internally at Sidecar.
Because of the lack of prometheus/port
, Sidecar defaults to http://127.0.0.1:80/prometheus
to collect metrics.
Environment
- Cloud Provider:
AlibabaCloud ACK
- Kubernetes Cluster Version [Output of
kubectl version
]
Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.3", GitCommit:"1e11e4a2108024935ecfcb2912226cedeafd99df", GitTreeState:"clean", BuildDate:"2020-10-14T12:50:19Z", GoVersion:"go1.15.2", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"16+", GitVersion:"v1.16.9-aliyun.1", GitCommit:"4f7ea78", GitTreeState:"", BuildDate:"2020-05-08T07:29:59Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
- Deployed Seldon System Images: [Output of
kubectl get --namespace seldon-system deploy seldon-controller-manager -o yaml | grep seldonio
]
Alternatively run echo "#### Kubernetes version:\n $(kubectl version) \n\n#### Seldon Images:\n$(kubectl get --namespace seldon-system deploy seldon-controller-manager -o yaml | grep seldonio)"
value: docker.io/seldonio/engine:1.5.0
value: docker.io/seldonio/seldon-core-executor:1.5.0
image: docker.io/seldonio/seldon-core-operator:1.5.0
- Istio version
1.8.0
Sidecar Log
2020-11-30T13:38:37.446855Z error failed scraping application metrics: error scraping http://localhost:80/prometheus: Get "http://localhost:80/prometheus": dial tcp 127.0.0.1:80: connect: connection refused
2020-11-30T13:38:52.446774Z error failed scraping application metrics: error scraping http://localhost:80/prometheus: Get "http://localhost:80/prometheus": dial tcp 127.0.0.1:80: connect: connection refused
2020-11-30T13:39:07.446908Z error failed scraping application metrics: error scraping http://localhost:80/prometheus: Get "http://localhost:80/prometheus": dial tcp 127.0.0.1:80: connect: connection refused
So the problem is that we're requiring the scrape path to be /prometheus
and the istio sidecar doesn't use that.
We normally suggest to disable istio sidecars by using
spec:
predictors:
- annotations:
sidecar.istio.io/inject: "false"
Or do you want to be able to use the sidecar? Is it only the sidecar that fails to scrape? Does the scraping work for the main model?
Or do you want to be able to use the sidecar?
Yes, I hope to use sidecar for traffic management and other tasks
Is it only the sidecar that fails to scrape? Does the scraping work for the main model?
I can’t guarantee, but when I ran the example provided by seldon, I found no other problems
So we need to update the sidecar to allow the correct Prometheus scrape path? Is it an http/https issue?
I think it is an issue of Operator
Maybe just need to set the Annotation prometheus.io/port
when it creates the Deployment
https://github.com/SeldonIO/seldon-core/blob/master/operator/controllers/seldondeployment_engine.go#L496
@domgoer and @cliveseldon - I can confirm that @domgoer's suggestion is correct.
Previously, we had the following error:
2020-12-07T17:28:21.670730Z error failed scraping application metrics: error scraping http://localhost:80/prometheus: Get "http://localhost:80/prometheus": dial tcp 127.0.0.1:80: connect: connection refused
Now we have zero logs of this value. Istio automatically merges all metrics on its own, so this will resolve this issue.
i.e., you only need to uncomment the "prometheus.io/port" line, and it's solved.
Closing. Please reopen if still an issue.
It looks this issue it not solved yet.
For anyone else struggling with it, it looks that setting the two following envs in the operator's pod helps:
EXECUTOR_SERVER_PORT: '80'
ENGINE_SERVER_PORT: '80'
The first causes the executor to listen on port 80
, so the metrics are available at the path Istio expects (without need for the annotation - that is the main issue here). The second env points Istio's VirtualServer into right port.
This doesn't solve the issue completely - as the annotation would do - but at least allows the metrics to be scraped without recompilation.
If you're using helm chart, these are the overrides you need to set: --set executor.user=0 --set executor.port=80 --set engine.port=80
.
It looks this issue it not solved yet.
For anyone else struggling with it, it looks that setting the two following envs in the operator's pod helps:
EXECUTOR_SERVER_PORT: '80' ENGINE_SERVER_PORT: '80'
The first causes the executor to listen on port
80
, so the metrics are available at the path Istio expects (without need for the annotation - that is the main issue here). The second env points Istio's VirtualServer into right port.This doesn't solve the issue completely - as the annotation would do - but at least allows the metrics to be scraped without recompilation.
If you're using helm chart, these are the overrides you need to set:
--set executor.user=0 --set executor.port=80 --set engine.port=80
.
Thanks mate. And if I installed with Istioctl?
istioctl manifest apply --set meshConfig.enablePrometheusMerge=true --set executor.user=0 --set executor.port=80 --set engine.port=80
didn't work..
@Shahard2 I don't know for sure, but I think istioctl
is used to deploy istio itself. These three params should be passed to seldon setup (no need to touch istio) via helm install
/helm upgrade
The issue looks like is not yet solved. @Shahard2 solution worked albeit it implies a few issues with Istio overall health, in my case at least. What it really solved was adding the annotation to my SeldonDeployment
:
...
spec:
name: my-model
....
predictors:
- annotations:
prometheus.io/port: '8000'
This doesn't affect the overall operator cooperation with Istio.
I work with seldon-core:1.13.1.
So I believe this solution would be the recommended one.
@cliveseldon might re-opening the issue be beneficial in your opinion?
Another alternative is to not use metrics merging - this allows the use of Istio sidecars, but also allows all metrics to be exposed as if the sidecar weren't there:
prometheus.istio.io/merge-metrics: "false"
The main issue is that Istio's metrics merging assumes a single application endpoint to be exposed for metrics, because that's what the Prometheus annotations approach expects. For pods with multiple containers producing metrics (e.g. a Seldon executor and a Seldon microservice or MLServer), this assumption breaks.