jaeger-operator icon indicating copy to clipboard operation
jaeger-operator copied to clipboard

jaeger-operator 1.19.0 custom serviceaccount

Open eladhar opened this issue 4 years ago • 7 comments

I have jaeger operator, version 1.19.0, running on a k8s cluster. I'm trying to use a custom serviceaccount for in the jaeger kind, in order to pull images from private repository. It look like this:

apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata:
  name: my-jaeger
spec:
  strategy: production
  serviceAccount: my-jaeger
  ingress:
    enabled: false
  agent:
    image: " mylocalresistry.com/jaegertracing/jaeger-agent:1.19.2"
  collector:
    image: " mylocalresistry.com/jaegertracing/jaeger-collector:1.19.2"
  query:
    image: " mylocalresistry.com/jaegertracing/jaeger-query:1.19.2"
    options:
      query:
        base-path: "/jaeger"
  storage:
    serviceAccount: my-jaeger
    type: elasticsearch
    options:
      es:
        server-urls: http://my-elasticsearch:9200
        index-prefix: jaeger-operator-elad
    dependencies:
      image: "artifactory.rnd-hub.com:6543/3rdparties/jaegertracing/spark-dependencies:latest"
      enabled: false
    esIndexCleaner:
      image: "artifactory.rnd-hub.com:6543/3rdparties/jaegertracing/jaeger-es-index-cleaner:1.19.2"
      enabled: false
    esRollover:
      image: "artifactory.rnd-hub.com:6543/3rdparties/jaegertracing/jaeger-es-rollover:1.19.2"
      enabled: false

Here is my custom serviceaccount creation (which is deployed before the kind Jaeger above):

apiVersion: v1
kind: ServiceAccount
metadata:
  name: my-jaeger
  labels:
    app.kubernetes.io/name: my-jaeger
    app.kubernetes.io/instance: my-jaeger
imagePullSecrets:
- name: my-secret

The problem is that the jaeger collector and query pods are not been created.

here are the errors from the jaeger-operator pod: jaeger-operator-74979766c5-wwxj8 jaeger-operator time="2020-11-03T15:07:14Z" level=error msg="failed to apply the changes" error="serviceaccounts \"my-jaeger\" already exists" execution="2020-11-03 15:07:14.947162148 +0000 UTC" instance=my-jaeger namespace=jaeger-test jaeger-operator-74979766c5-wwxj8 jaeger-operator time="2020-11-03T15:07:16Z" level=error msg="failed to store the failed status into the current CustomResource after the reconciliation" error="jaegers.jaegertracing.io \"my-jaeger\" not found" execution="2020-11-03 15:07:15.976150378 +0000 UTC" instance=my-jaeger namespace=jaeger-test

eladhar avatar Nov 03 '20 16:11 eladhar

Could you try another name for the service account? You likely found a bug, but a simple workaround would be to use a different name than the one that Jaeger would itself provision.

jpkrohling avatar Nov 03 '20 16:11 jpkrohling

Hi @jpkrohling, Thanks for your quick response.

It looks like you are right, when i set the same name for both ServiceAccount and Jager the issue is reproduced, otherwise it works, its indeed looks like a bug.

eladhar avatar Nov 04 '20 14:11 eladhar

Hi @jpkrohling,

I also seem to run into this bug on docker desktop kubernetes but I was not able to workaround the issue by using a self-named and self-deployed serviceAccount because the current operator 2.19.1 deployment still creates his own service accounts one for the operator and one for the jaeger kind and it has the same name as the jaeger kind. Unfortunately I was not able to change that by "serviceAccount" entries in the operator helm chart values.yaml. So the jaeger kind cr could not be updated to "running" state by the operator pod.

But the jaeger kind cr can be viewed by:

kubectl get jaegers.jaegertracing.io/pau-monitor-jaeger-operator-jaeger -n dev-pau-monitor
NAME                                 STATUS   VERSION
pau-monitor-jaeger-operator-jaeger            1.21.0

As you can see here the STATUS is empty because of error at the end of the snippet of the operator log:

level=info msg=Versions arch=amd64 identity=dev-pau-monitor.pau-monitor-jaeger-operator jaeger=1.21.0 jaeger-operator=v1.21.3 operator-sdk=v0.18.2 os=linux version=go1.14.15
level=info msg="Consider running the operator in a cluster-wide scope for extra features"
level=info msg="Auto-detected the platform" platform=kubernetes
level=info msg="Auto-detected ingress api" ingress-api=networking
level=info msg="Automatically adjusted the 'es-provision' flag" es-provision=no
level=info msg="Automatically adjusted the 'kafka-provision' flag" kafka-provision=no
level=info msg="Install prometheus-operator in your cluster to create ServiceMonitor objects" error="no ServiceMonitor registered with the API"
level=info msg="No suitable Jaeger instances found to inject a sidecar" deployment=tracegen
level=error msg="failed to store the running status into the current CustomResource" error="jaegers.jaegertracing.io \"pau-monitor-jaeger-operator-jaeger\" not found" execution="2021-03-02 09:24:13.1521423 +0000 UTC" instance=pau-monitor-jaeger-operator-jaeger namespace=dev-pau-monitor

Can you please help me?

JohnFrampton avatar Mar 02 '21 09:03 JohnFrampton

Unfortunately I was not able to change that by "serviceAccount" entries in the operator helm chart values.yaml.

Looks like this is a problem with the charts then. Would you mind opening an issue there?

jpkrohling avatar Mar 02 '21 10:03 jpkrohling

Thanks. Yes , I can surely open a charts-issue. I just though my problem might have anything to do with this issue.

thanks again.

JohnFrampton avatar Mar 02 '21 10:03 JohnFrampton

I think this is a similar error, but renaming Jaeger resource didn't help

https://github.com/jaegertracing/helm-charts/issues/272

advissor avatar Jul 26 '21 10:07 advissor

im getting something similar as well see #1655

operator says:

time="2021-12-08T18:42:02Z" level=info msg="The service account running this operator does not have the role 'system:auth-delegator', consider granting it for additional capabilities"
I1208 18:42:09.946964       1 request.go:621] Throttling request took 1.046961557s, request: GET:https://10.96.0.1:443/apis/extensions/v1beta1?timeout=32s
time="2021-12-08T18:42:12Z" level=warning msg="could not create ServiceMonitor object" error="unable to retrieve the complete list of server APIs: custom.metrics.k8s.io/v1beta1: the server is currently unable to handle the request, metrics.k8s.io/v1beta1: the server could not find the requested resource"
time="2021-12-08T18:44:12Z" level=error msg="failed to store the failed status into the current CustomResource after the reconciliation" error="jaegers.jaegertracing.io \"jaeger-operator-jaeger\" not found" execution="2021-12-08 18:42:12.531941393 +0000 UTC" instance=jaeger-operator-jaeger namespace=observability
time="2021-12-08T18:44:12Z" level=error msg="failed to apply the changes" error="timed out waiting for the condition" execution="2021-12-08 18:42:12.531941393 +0000 UTC" instance=jaeger-operator-jaeger namespace=observability
time="2021-12-08T18:46:13Z" level=error msg="failed to store the failed status into the current CustomResource after the reconciliation" error="jaegers.jaegertracing.io \"jaeger-operator-jaeger\" not found" execution="2021-12-08 18:44:13.672424134 +0000 UTC" instance=jaeger-operator-jaeger namespace=observability
time="2021-12-08T18:46:13Z" level=error msg="failed to apply the changes" error="timed out waiting for the condition" execution="2021-12-08 18:44:13.672424134 +0000 UTC" instance=jaeger-operator-jaeger namespace=observability
time="2021-12-08T18:48:14Z" level=error msg="failed to store the failed status into the current CustomResource after the reconciliation" error="jaegers.jaegertracing.io \"jaeger-operator-jaeger\" not found" execution="2021-12-08 18:46:14.695341926 +0000 UTC" instance=jaeger-operator-jaeger namespace=observability
time="2021-12-08T18:48:14Z" level=error msg="failed to apply the changes" error="timed out waiting for the condition" execution="2021-12-08 18:46:14.695341926 +0000 UTC" instance=jaeger-operator-jaeger namespace=observability

caught this event:

Error creating: pods "jaeger-operator-jaeger-es-rollover-create-mapping-" is forbidden: error looking up service account observability/jaeger-operator-jaeger: serviceaccount "jaeger-operator-jaeger" not found

theres a job just sitting there:

jaeger-operator-jaeger-es-rollover-create-mapping 0/1 2m54s

but i did ilm myself and im under the impression that job shouldnt be running as i set:

esRollover:
    enabled: false

perezjasonr avatar Dec 08 '21 18:12 perezjasonr

Hi @jpkrohling , I would like to work on this issue. Could you please assign this to me?

parauliya avatar Jul 25 '23 04:07 parauliya

@parauliya done

iblancasa avatar Jul 25 '23 09:07 iblancasa

Hi @iblancasa , @jpkrohling , There are following two approaches with which this can be resolved when the serviceaccount with the same name exist during the creation of jaeger resource,

  1. we should skip the provision of that serviceaccount and move forward with the resource creation.
  2. We should delete the existing serviceaccount and provision a new one as per the jaeger controller.

There are different pros and cons of the above two approaches. Please let me know which approach do you think I should go with.

parauliya avatar Jul 27 '23 08:07 parauliya

I would go with the first approach.

iblancasa avatar Jul 27 '23 09:07 iblancasa

I would go with the first approach.

Hi @iblancasa , The only issue with this approach is, what if the existing service account doesn't have the required permission which is required by a jaegar resource?

parauliya avatar Jul 28 '23 05:07 parauliya

I would go with the first approach.

Hi @iblancasa , The only issue with this approach is, what if the existing service account doesn't have the required permission which is required by a jaegar resource?

We can skip the creation of the account but provision the needed permissions.

iblancasa avatar Jul 28 '23 10:07 iblancasa

I would go with the first approach.

Hi @iblancasa , The only issue with this approach is, what if the existing service account doesn't have the required permission which is required by a jaegar resource?

We can skip the creation of the account but provision the needed permissions.

Hi @iblancasa , I looked into the code and found out that the above logic is already been implemented by you, right? So this issue is not about service account any more but about something else. Could you please help what is this is about, is this about chart issue or about rollover ilm or something else?

parauliya avatar Jul 31 '23 11:07 parauliya

I would go with the first approach.

Hi @iblancasa , The only issue with this approach is, what if the existing service account doesn't have the required permission which is required by a jaegar resource?

We can skip the creation of the account but provision the needed permissions.

Hi @iblancasa , I looked into the code and found out that the above logic is already been implemented by you, right?

Glad to hear this is no longer an issue. I'm checking the source code but I'm not sure what logic I implemented fixing this issue.

So this issue is not about service account any more but about something else. Could you please help what is this is about, is this about chart issue or about rollover ilm or something else?

Since it is a different problem, could you create a new issue for it?

iblancasa avatar Jul 31 '23 14:07 iblancasa

Hi @iblancasa , Sorry but I misunderstood this. I played around Jaeger resources and service accounts a bit more today and found out that only the behaviour of Jaeger resource has changed during the new releases but the root cause is still the same which is the existing service accounts with the same name as in Jaeger resource file but do not have following two labels:

"app.kubernetes.io/instance":   <same as jaeger>,
"app.kubernetes.io/managed-by": "jaeger-operator", 

If the existing service account has the above two labels then while creating the Jaeger resource it doesn't fail and update the existing service account.

Hence the root cause is these two labels which are not present in the existing service accounts. Actually Jaeger controller tries to find the existing service accounts with these two labels present in it and also part of the same namespace as Jaeger is. The simplest solution will be just remove the condition of labels while finding the existing service account in the namespace of Jaeger. Please let me know what do you think of it.

parauliya avatar Aug 01 '23 12:08 parauliya

Hi @iblancasa , Sorry but I misunderstood this. I played around Jaeger resources and service accounts a bit more today and found out that only the behaviour of Jaeger resource has changed during the new releases but the root cause is still the same which is the existing service accounts with the same name as in Jaeger resource file but do not have following two labels:

"app.kubernetes.io/instance":   <same as jaeger>,
"app.kubernetes.io/managed-by": "jaeger-operator", 

If the existing service account has the above two labels then while creating the Jaeger resource it doesn't fail and update the existing service account.

Hence the root cause is these two labels which are not present in the existing service accounts. Actually Jaeger controller tries to find the existing service accounts with these two labels present in it and also part of the same namespace as Jaeger is. The simplest solution will be just remove the condition of labels while finding the existing service account in the namespace of Jaeger. Please let me know what do you think of it.

I'm not sure about this solution. We could end up removing service accounts not related to the Jaeger Operator. And the current approach makes more sense since looks for the correct signals that the SA is operated by the Jaeger Operator.

I think it would make more sense to fix the upgrade logic to add those labels to the affected service accounts. For people running into this and using the latest version, I would add the labels to their SAs or remove them and allow the operator to recreate everything.

iblancasa avatar Aug 01 '23 14:08 iblancasa

Hi @iblancasa , Sorry but I misunderstood this. I played around Jaeger resources and service accounts a bit more today and found out that only the behaviour of Jaeger resource has changed during the new releases but the root cause is still the same which is the existing service accounts with the same name as in Jaeger resource file but do not have following two labels:

"app.kubernetes.io/instance":   <same as jaeger>,
"app.kubernetes.io/managed-by": "jaeger-operator", 

If the existing service account has the above two labels then while creating the Jaeger resource it doesn't fail and update the existing service account. Hence the root cause is these two labels which are not present in the existing service accounts. Actually Jaeger controller tries to find the existing service accounts with these two labels present in it and also part of the same namespace as Jaeger is. The simplest solution will be just remove the condition of labels while finding the existing service account in the namespace of Jaeger. Please let me know what do you think of it.

I'm not sure about this solution. We could end up removing service accounts not related to the Jaeger Operator. And the current approach makes more sense since looks for the correct signals that the SA is operated by the Jaeger Operator.

I think it would make more sense to fix the upgrade logic to add those labels to the affected service accounts. For people running into this and using the latest version, I would add the labels to their SAs or remove them and allow the operator to recreate everything.

I thought of this that we could end up removing service account not related to Jaeger Operator. But I assumed that all the SAs in the Jaeger namespace will be related to Jaeger only. If my assumption is not correct than the existing approach make more sense.

Also yes we can add a step of adding these labels into all the SAs in the Jaegar namespace into upgrade logic. I think this looks more neat and clean approach.

parauliya avatar Aug 01 '23 18:08 parauliya

But I assumed that all the SAs in the Jaeger namespace will be related to Jaeger only.

There is no real restriction about this. That's the reason for the current approach.

iblancasa avatar Aug 02 '23 10:08 iblancasa

But I assumed that all the SAs in the Jaeger namespace will be related to Jaeger only.

There is no real restriction about this. That's the reason for the current approach.

Yeah I got it, Thanks Also anything else which needs to be done for this issue or we should just close this?

parauliya avatar Aug 02 '23 10:08 parauliya

I would say this:

Also yes we can add a step of adding these labels into all the SAs in the Jaegar namespace into upgrade logic. I think this looks more neat and clean approach.

iblancasa avatar Aug 02 '23 10:08 iblancasa

Hi @iblancasa , Could you please take a look at the following PR: https://github.com/jaegertracing/jaeger-operator/pull/2283

parauliya avatar Aug 03 '23 18:08 parauliya

Agreed in https://github.com/jaegertracing/jaeger-operator/pull/2283 to close the issue and the PR.

iblancasa avatar Feb 07 '24 12:02 iblancasa