jaeger-operator
jaeger-operator copied to clipboard
jaeger-operator 1.19.0 custom serviceaccount
I have jaeger operator, version 1.19.0, running on a k8s cluster. I'm trying to use a custom serviceaccount for in the jaeger kind, in order to pull images from private repository. It look like this:
apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata:
name: my-jaeger
spec:
strategy: production
serviceAccount: my-jaeger
ingress:
enabled: false
agent:
image: " mylocalresistry.com/jaegertracing/jaeger-agent:1.19.2"
collector:
image: " mylocalresistry.com/jaegertracing/jaeger-collector:1.19.2"
query:
image: " mylocalresistry.com/jaegertracing/jaeger-query:1.19.2"
options:
query:
base-path: "/jaeger"
storage:
serviceAccount: my-jaeger
type: elasticsearch
options:
es:
server-urls: http://my-elasticsearch:9200
index-prefix: jaeger-operator-elad
dependencies:
image: "artifactory.rnd-hub.com:6543/3rdparties/jaegertracing/spark-dependencies:latest"
enabled: false
esIndexCleaner:
image: "artifactory.rnd-hub.com:6543/3rdparties/jaegertracing/jaeger-es-index-cleaner:1.19.2"
enabled: false
esRollover:
image: "artifactory.rnd-hub.com:6543/3rdparties/jaegertracing/jaeger-es-rollover:1.19.2"
enabled: false
Here is my custom serviceaccount creation (which is deployed before the kind Jaeger above):
apiVersion: v1
kind: ServiceAccount
metadata:
name: my-jaeger
labels:
app.kubernetes.io/name: my-jaeger
app.kubernetes.io/instance: my-jaeger
imagePullSecrets:
- name: my-secret
The problem is that the jaeger collector and query pods are not been created.
here are the errors from the jaeger-operator pod:
jaeger-operator-74979766c5-wwxj8 jaeger-operator time="2020-11-03T15:07:14Z" level=error msg="failed to apply the changes" error="serviceaccounts \"my-jaeger\" already exists" execution="2020-11-03 15:07:14.947162148 +0000 UTC" instance=my-jaeger namespace=jaeger-test jaeger-operator-74979766c5-wwxj8 jaeger-operator time="2020-11-03T15:07:16Z" level=error msg="failed to store the failed status into the current CustomResource after the reconciliation" error="jaegers.jaegertracing.io \"my-jaeger\" not found" execution="2020-11-03 15:07:15.976150378 +0000 UTC" instance=my-jaeger namespace=jaeger-test
Could you try another name for the service account? You likely found a bug, but a simple workaround would be to use a different name than the one that Jaeger would itself provision.
Hi @jpkrohling, Thanks for your quick response.
It looks like you are right, when i set the same name for both ServiceAccount and Jager the issue is reproduced, otherwise it works, its indeed looks like a bug.
Hi @jpkrohling,
I also seem to run into this bug on docker desktop kubernetes but I was not able to workaround the issue by using a self-named and self-deployed serviceAccount because the current operator 2.19.1 deployment still creates his own service accounts one for the operator and one for the jaeger kind and it has the same name as the jaeger kind. Unfortunately I was not able to change that by "serviceAccount" entries in the operator helm chart values.yaml. So the jaeger kind cr could not be updated to "running" state by the operator pod.
But the jaeger kind cr can be viewed by:
kubectl get jaegers.jaegertracing.io/pau-monitor-jaeger-operator-jaeger -n dev-pau-monitor
NAME STATUS VERSION
pau-monitor-jaeger-operator-jaeger 1.21.0
As you can see here the STATUS is empty because of error at the end of the snippet of the operator log:
level=info msg=Versions arch=amd64 identity=dev-pau-monitor.pau-monitor-jaeger-operator jaeger=1.21.0 jaeger-operator=v1.21.3 operator-sdk=v0.18.2 os=linux version=go1.14.15
level=info msg="Consider running the operator in a cluster-wide scope for extra features"
level=info msg="Auto-detected the platform" platform=kubernetes
level=info msg="Auto-detected ingress api" ingress-api=networking
level=info msg="Automatically adjusted the 'es-provision' flag" es-provision=no
level=info msg="Automatically adjusted the 'kafka-provision' flag" kafka-provision=no
level=info msg="Install prometheus-operator in your cluster to create ServiceMonitor objects" error="no ServiceMonitor registered with the API"
level=info msg="No suitable Jaeger instances found to inject a sidecar" deployment=tracegen
level=error msg="failed to store the running status into the current CustomResource" error="jaegers.jaegertracing.io \"pau-monitor-jaeger-operator-jaeger\" not found" execution="2021-03-02 09:24:13.1521423 +0000 UTC" instance=pau-monitor-jaeger-operator-jaeger namespace=dev-pau-monitor
Can you please help me?
Unfortunately I was not able to change that by "serviceAccount" entries in the operator helm chart values.yaml.
Looks like this is a problem with the charts then. Would you mind opening an issue there?
Thanks. Yes , I can surely open a charts-issue. I just though my problem might have anything to do with this issue.
thanks again.
I think this is a similar error, but renaming Jaeger resource didn't help
https://github.com/jaegertracing/helm-charts/issues/272
im getting something similar as well see #1655
operator says:
time="2021-12-08T18:42:02Z" level=info msg="The service account running this operator does not have the role 'system:auth-delegator', consider granting it for additional capabilities"
I1208 18:42:09.946964 1 request.go:621] Throttling request took 1.046961557s, request: GET:https://10.96.0.1:443/apis/extensions/v1beta1?timeout=32s
time="2021-12-08T18:42:12Z" level=warning msg="could not create ServiceMonitor object" error="unable to retrieve the complete list of server APIs: custom.metrics.k8s.io/v1beta1: the server is currently unable to handle the request, metrics.k8s.io/v1beta1: the server could not find the requested resource"
time="2021-12-08T18:44:12Z" level=error msg="failed to store the failed status into the current CustomResource after the reconciliation" error="jaegers.jaegertracing.io \"jaeger-operator-jaeger\" not found" execution="2021-12-08 18:42:12.531941393 +0000 UTC" instance=jaeger-operator-jaeger namespace=observability
time="2021-12-08T18:44:12Z" level=error msg="failed to apply the changes" error="timed out waiting for the condition" execution="2021-12-08 18:42:12.531941393 +0000 UTC" instance=jaeger-operator-jaeger namespace=observability
time="2021-12-08T18:46:13Z" level=error msg="failed to store the failed status into the current CustomResource after the reconciliation" error="jaegers.jaegertracing.io \"jaeger-operator-jaeger\" not found" execution="2021-12-08 18:44:13.672424134 +0000 UTC" instance=jaeger-operator-jaeger namespace=observability
time="2021-12-08T18:46:13Z" level=error msg="failed to apply the changes" error="timed out waiting for the condition" execution="2021-12-08 18:44:13.672424134 +0000 UTC" instance=jaeger-operator-jaeger namespace=observability
time="2021-12-08T18:48:14Z" level=error msg="failed to store the failed status into the current CustomResource after the reconciliation" error="jaegers.jaegertracing.io \"jaeger-operator-jaeger\" not found" execution="2021-12-08 18:46:14.695341926 +0000 UTC" instance=jaeger-operator-jaeger namespace=observability
time="2021-12-08T18:48:14Z" level=error msg="failed to apply the changes" error="timed out waiting for the condition" execution="2021-12-08 18:46:14.695341926 +0000 UTC" instance=jaeger-operator-jaeger namespace=observability
caught this event:
Error creating: pods "jaeger-operator-jaeger-es-rollover-create-mapping-" is forbidden: error looking up service account observability/jaeger-operator-jaeger: serviceaccount "jaeger-operator-jaeger" not found
theres a job just sitting there:
jaeger-operator-jaeger-es-rollover-create-mapping 0/1 2m54s
but i did ilm myself and im under the impression that job shouldnt be running as i set:
esRollover:
enabled: false
Hi @jpkrohling , I would like to work on this issue. Could you please assign this to me?
@parauliya done
Hi @iblancasa , @jpkrohling , There are following two approaches with which this can be resolved when the serviceaccount with the same name exist during the creation of jaeger resource,
- we should skip the provision of that serviceaccount and move forward with the resource creation.
- We should delete the existing serviceaccount and provision a new one as per the jaeger controller.
There are different pros and cons of the above two approaches. Please let me know which approach do you think I should go with.
I would go with the first approach.
I would go with the first approach.
Hi @iblancasa , The only issue with this approach is, what if the existing service account doesn't have the required permission which is required by a jaegar resource?
I would go with the first approach.
Hi @iblancasa , The only issue with this approach is, what if the existing service account doesn't have the required permission which is required by a jaegar resource?
We can skip the creation of the account but provision the needed permissions.
I would go with the first approach.
Hi @iblancasa , The only issue with this approach is, what if the existing service account doesn't have the required permission which is required by a jaegar resource?
We can skip the creation of the account but provision the needed permissions.
Hi @iblancasa , I looked into the code and found out that the above logic is already been implemented by you, right? So this issue is not about service account any more but about something else. Could you please help what is this is about, is this about chart issue or about rollover ilm or something else?
I would go with the first approach.
Hi @iblancasa , The only issue with this approach is, what if the existing service account doesn't have the required permission which is required by a jaegar resource?
We can skip the creation of the account but provision the needed permissions.
Hi @iblancasa , I looked into the code and found out that the above logic is already been implemented by you, right?
Glad to hear this is no longer an issue. I'm checking the source code but I'm not sure what logic I implemented fixing this issue.
So this issue is not about service account any more but about something else. Could you please help what is this is about, is this about chart issue or about rollover ilm or something else?
Since it is a different problem, could you create a new issue for it?
Hi @iblancasa , Sorry but I misunderstood this. I played around Jaeger resources and service accounts a bit more today and found out that only the behaviour of Jaeger resource has changed during the new releases but the root cause is still the same which is the existing service accounts with the same name as in Jaeger resource file but do not have following two labels:
"app.kubernetes.io/instance": <same as jaeger>,
"app.kubernetes.io/managed-by": "jaeger-operator",
If the existing service account has the above two labels then while creating the Jaeger resource it doesn't fail and update the existing service account.
Hence the root cause is these two labels which are not present in the existing service accounts. Actually Jaeger controller tries to find the existing service accounts with these two labels present in it and also part of the same namespace as Jaeger is. The simplest solution will be just remove the condition of labels while finding the existing service account in the namespace of Jaeger. Please let me know what do you think of it.
Hi @iblancasa , Sorry but I misunderstood this. I played around Jaeger resources and service accounts a bit more today and found out that only the behaviour of Jaeger resource has changed during the new releases but the root cause is still the same which is the existing service accounts with the same name as in Jaeger resource file but do not have following two labels:
"app.kubernetes.io/instance": <same as jaeger>, "app.kubernetes.io/managed-by": "jaeger-operator",
If the existing service account has the above two labels then while creating the Jaeger resource it doesn't fail and update the existing service account.
Hence the root cause is these two labels which are not present in the existing service accounts. Actually Jaeger controller tries to find the existing service accounts with these two labels present in it and also part of the same namespace as Jaeger is. The simplest solution will be just remove the condition of labels while finding the existing service account in the namespace of Jaeger. Please let me know what do you think of it.
I'm not sure about this solution. We could end up removing service accounts not related to the Jaeger Operator. And the current approach makes more sense since looks for the correct signals that the SA is operated by the Jaeger Operator.
I think it would make more sense to fix the upgrade logic to add those labels to the affected service accounts. For people running into this and using the latest version, I would add the labels to their SAs or remove them and allow the operator to recreate everything.
Hi @iblancasa , Sorry but I misunderstood this. I played around Jaeger resources and service accounts a bit more today and found out that only the behaviour of Jaeger resource has changed during the new releases but the root cause is still the same which is the existing service accounts with the same name as in Jaeger resource file but do not have following two labels:
"app.kubernetes.io/instance": <same as jaeger>, "app.kubernetes.io/managed-by": "jaeger-operator",
If the existing service account has the above two labels then while creating the Jaeger resource it doesn't fail and update the existing service account. Hence the root cause is these two labels which are not present in the existing service accounts. Actually Jaeger controller tries to find the existing service accounts with these two labels present in it and also part of the same namespace as Jaeger is. The simplest solution will be just remove the condition of labels while finding the existing service account in the namespace of Jaeger. Please let me know what do you think of it.
I'm not sure about this solution. We could end up removing service accounts not related to the Jaeger Operator. And the current approach makes more sense since looks for the correct signals that the SA is operated by the Jaeger Operator.
I think it would make more sense to fix the upgrade logic to add those labels to the affected service accounts. For people running into this and using the latest version, I would add the labels to their SAs or remove them and allow the operator to recreate everything.
I thought of this that we could end up removing service account not related to Jaeger Operator. But I assumed that all the SAs in the Jaeger namespace will be related to Jaeger only. If my assumption is not correct than the existing approach make more sense.
Also yes we can add a step of adding these labels into all the SAs in the Jaegar namespace into upgrade logic. I think this looks more neat and clean approach.
But I assumed that all the SAs in the Jaeger namespace will be related to Jaeger only.
There is no real restriction about this. That's the reason for the current approach.
But I assumed that all the SAs in the Jaeger namespace will be related to Jaeger only.
There is no real restriction about this. That's the reason for the current approach.
Yeah I got it, Thanks Also anything else which needs to be done for this issue or we should just close this?
I would say this:
Also yes we can add a step of adding these labels into all the SAs in the Jaegar namespace into upgrade logic. I think this looks more neat and clean approach.
Hi @iblancasa , Could you please take a look at the following PR: https://github.com/jaegertracing/jaeger-operator/pull/2283
Agreed in https://github.com/jaegertracing/jaeger-operator/pull/2283 to close the issue and the PR.