skywalking icon indicating copy to clipboard operation
skywalking copied to clipboard

[Feature] [SWCK] Add Instrument CRD to config and operate instruments.

Open Duncan-tree-zhou opened this issue 2 years ago • 20 comments

Search before asking

  • [X] I had searched in the issues and found no similar feature requirement.

Description

To config agents, upgrade agents, enable/disable plugins globally(or by batch) are common senarios for operation engineer. The default configmap can not meet all the requirements. To define a CRD for instruments and let the skwaylking operator inject instruments by CRs, may solve these problems.

Use case

No response

Related issues

No response

Are you willing to submit a PR?

  • [X] Yes I am willing to submit a PR!

Code of Conduct

Duncan-tree-zhou avatar Jun 01 '22 12:06 Duncan-tree-zhou

Thanks for this proposal. There are some tips for you.

The CRD should contain the following configuration:

  1. The sidecar configuration. As you said, the user can create a sidecar template to upgrade the agent.
  2. The annotation configuration. Users can create an annotation template.
  3. The labelselector to define which target object should be updated, like deployment/statefulset etc.

The Controller should do the following things:

  1. When the user creates/updates a CR, the controller can get the target object with the labelselector of the CR and inject the annotation to the target object.

dashanji avatar Jun 01 '22 13:06 dashanji

Hi @dashanji,

I looked into the implements of SWCK, and saw the javaagent CRD is already exists. The javaagent CR is created by javaagent-controller when pods get annotations with "sidecar.skywalking.apache.org/succeed=true". It looks like the javaagent CR is a copy of agent configs.

If a controller was created to watch the change of javaagent CR and update target pod when CR changed. Then there will be two agent configs going their own block of code to operate agent injection/removal. And it will be diffecult to maintain the code in the future.

Besides, the annotation configs is pod specific, while the javaagent CR config could effect a batch of pods. And what more is that the pods have it's own Skywalking env configs. There are multi way for uses to config the agent.

For these two problems, I got some points to workaround.

  1. to avoid increasing the difficulty of code maintainance, there should be a common method to do the agent injection/removal. Actually, I think there are only two places who would invoke the common method: pod webhook handler and agent CR reconciler.
  2. to solve the multi way of configs problem. we should define the override precedence for CR configs, annotations configs, pod env configs. And finally, the configs should be composed by override precedence into a patch to update the pod. (the precedence of default configmap is defined in skywalking, so it can not be change it in SWCK. )
  3. The common method use the composed configs as its input.

what do you think about this?

Duncan-tree-zhou avatar Jun 04 '22 16:06 Duncan-tree-zhou

And here are questions to be clarified:

  1. extend JavaAgent CRD or create new one? and the same question for controller.
  2. where and when to operate agent injection/removal? (should be webhook handler and controller reconciler. webhook for annotation change, controller reconciler for agent CR change. i don't kown if there are any other situation to be operate?)
  3. how many ways can user config agents? and the precedence? (CR configs, annotation configs, pod env. i don't know if there are any other ways to config.)

Duncan-tree-zhou avatar Jun 04 '22 17:06 Duncan-tree-zhou

FYI @hanahmily PTAL

wu-sheng avatar Jun 04 '22 22:06 wu-sheng

Hi @dashanji,

I looked into the implements of SWCK, and saw the javaagent CRD is already exists. The javaagent CR is created by javaagent-controller when pods get annotations with "sidecar.skywalking.apache.org/succeed=true". It looks like the javaagent CR is a copy of agent configs.

If a controller was created to watch the change of javaagent CR and update target pod when CR changed. Then there will be two agent configs going their own block of code to operate agent injection/removal. And it will be diffecult to maintain the code in the future.

Besides, the annotation configs is pod specific, while the javaagent CR config could effect a batch of pods. And what more is that the pods have it's own Skywalking env configs. There are multi way for uses to config the agent.

For these two problems, I got some points to workaround.

  1. to avoid increasing the difficulty of code maintainance, there should be a common method to do the agent injection/removal. Actually, I think there are only two places who would invoke the common method: pod webhook handler and agent CR reconciler.
  2. to solve the multi way of configs problem. we should define the override precedence for CR configs, annotations configs, pod env configs. And finally, the configs should be composed by override precedence into a patch to update the pod. (the precedence of default configmap is defined in skywalking, so it can not be change it in SWCK. )
  3. The common method use the composed configs as its input.

what do you think about this?

In my mind, because we have defined the precedence order to configure the agent, the Javaagent CR is created for
observing the final injected agent configuration. It is not a CR for configuration.

Yep, your points are right, now the java agent injector is a pod webhook handler, so I think the agent CR reconciler is better. To keep the code compatible and make it easy to code, I recommend defining a precedence order like Agent Configuration CR > Annotations > Configmap > Default configmap.

dashanji avatar Jun 05 '22 03:06 dashanji

And here are questions to be clarified:

  1. extend JavaAgent CRD or create new one? and the same question for controller.
  2. where and when to operate agent injection/removal? (should be webhook handler and controller reconciler. webhook for annotation change, controller reconciler for agent CR change. i don't kown if there are any other situation to be operate?)
  3. how many ways can user config agents? and the precedence? (CR configs, annotation configs, pod env. i don't know if there are any other ways to config.)
  1. Creating a new CR is good for me.
  2. I think a simple Controller can solve the problem.
  3. Currently, the java agent is a sidecar, so the annotation, Configmap, Default configmap is the only way to set the final injected configmap.

dashanji avatar Jun 05 '22 03:06 dashanji

Is a new CR and/or operator breaking anything deployed? We should consider forward compatibility as we are in 4.x iteration, which means no breakin change.

wu-sheng avatar Jun 05 '22 04:06 wu-sheng

I think there is no breaking change, as the new CR could be considered as a Set of annotation.

dashanji avatar Jun 05 '22 05:06 dashanji

From deployment, breaking doesn't have to mean conflicting, such as whether legacy configs would be ignored? This should be considered a breaking change for an operator.

wu-sheng avatar Jun 05 '22 08:06 wu-sheng

@wu-sheng @dashanji, from our discussion, i got the idea. we can create a new CR and operator. CR have the same annotations as pod level annotation for agents. And add label selector and pod selector attribute in CR, so that CR operator can inject the correct batch of pods.

finally the CR may looks like:

apiVersion: skywalking.apache.org/v1
kind: SwAgent
metadata:
  annotations:
    sidecar.skywalking.apache.org/initcontainer.Image: "apache/skywalking-java-agent:8.8.0-java8"
    sidecar.skywalking.apache.org/configmapVolume.ConfigMap.Name: "skywalking-swck-java-agent-configmap"
    ...
    agent.skywalking.apache.org/logging.level: "DEBUG"
    agent.skywalking.apache.org/logging.output: "CONSOLE"
    ...
  name: skywalking-agent
  namespace: skywalking
spec:
  objectSelector:
    skywalking.apache.org/java-inject: true
    skywalking.apache.org/java-inject-batch: "batch1"
  containerMatcher: "*" (regular expression)

Duncan-tree-zhou avatar Jun 05 '22 09:06 Duncan-tree-zhou

From deployment, breaking doesn't have to mean conflicting, such as whether legacy configs would be ignored? This should be considered a breaking change for an operator.

In my mind, the feature is to provide a new way to configure the agent for the targeted object, and will not affect the original configuration.

When users configure the agent in this way, they need to follow a precedence order such as Agent Configuration CR > Annotations > Configmap > Default configmap. If users don't need to configure the batch of applications, the original configuration is enough, so they don't need to create a new Agent Configuration CR.

dashanji avatar Jun 05 '22 10:06 dashanji

@wu-sheng @dashanji, from our discussion, i got the idea. we can create a new CR and operator. CR have the same annotations as pod level annotation for agents. And add label selector and pod selector attribute in CR, so that CR operator can inject the correct batch of pods.

finally the CR may looks like:

apiVersion: skywalking.apache.org/v1
kind: SwAgent
metadata:
  annotations:
    sidecar.skywalking.apache.org/initcontainer.Image: "apache/skywalking-java-agent:8.8.0-java8"
    sidecar.skywalking.apache.org/configmapVolume.ConfigMap.Name: "skywalking-swck-java-agent-configmap"
    ...
    agent.skywalking.apache.org/logging.level: "DEBUG"
    agent.skywalking.apache.org/logging.output: "CONSOLE"
    ...
  name: skywalking-agent
  namespace: skywalking
spec:
  objectSelector:
    skywalking.apache.org/java-inject: true
    skywalking.apache.org/java-inject-batch: "batch1"
  containerMatcher: "*" (regular expression)

Overall, the CR is good for me.

What do you think about moving the annotation to the Spec field? In this way, we can validate whether the annotation key is right.

dashanji avatar Jun 05 '22 10:06 dashanji

How about removing the config map to embrace the CRD? We could simplify the code base by supporting a single configuration system. That will bring some benefits like:

  • If users want to set up a batch or all agents, they could set up a CR.
  • If users have to debug a deployment, they could leverage annotation to override some items defined in CR.

The controller onboarding will create a default CR. Users have a chance to modify it later. And the most critical is to determine the precedence between CRs once they match the same agents.

hanahmily avatar Jun 06 '22 02:06 hanahmily

Agree! Configuration through CR is indeed clearer than configmap. However, the current agent configuration is mounted through configmap. We can use CR to manage the original configmap configuration, which makes the agent configuration clear.

dashanji avatar Jun 06 '22 12:06 dashanji

@dashanji, When moving the annotation to CR spec, its not a good way to hardcode the mapping from CR spec fields to skywalking app env keys. Because the app level env keys might be added or removed in any version update, it may cause many compatibility problem. What do you think about to move to a flexible fields like map, kv list, etc. Actually I am working on a version of it.

Agree with @hanahmily +1, CR is more earily to be used, and with CR, it's more flexible for us to control injection behaviour when configs changed.

Duncan-tree-zhou avatar Jun 06 '22 14:06 Duncan-tree-zhou

Because the app level env keys might be added or removed in any version update, it may cause many compatibility problem.

A little context, config keys and system environment keys would not be changed easily, and no side effects for agents if they are expired. I think the key question is, what keys would you add as hard codes? If all keys in agent config docs, then it is nearly impossible you could list all. Users could load their private plugins with new configurationa in it.

wu-sheng avatar Jun 06 '22 15:06 wu-sheng

One another, the current swck doesn't work for optional plugins properly. The plugin.mount is being used, but only set for default values, then it works for nothing. swck should create a new folder (activated-plugins) for activated optional plugins, and set it to plugin.mount Such as plugin.mount=plugins,activations,activated-plugins

This is the current version

image

image

It could work, but too complex and confusing.

wu-sheng avatar Jun 06 '22 15:06 wu-sheng

is there a guide to debug SWCK operator locally? i had tried the common steps, but still getting error. did i miss some steps?

# setup ~/.kube/config to a kubernetes cluster
cd operator
make generate
make manifests
make install
go run main.go
......
2022-06-07T23:25:50.418+0800    INFO    controller.ui   Starting EventSource    {"reconciler group": "operator.skywalking.apache.org", "reconciler kind": "UI", "source": "kind source: /, Kind="}
2022-06-07T23:25:50.418+0800    INFO    controller.ui   Starting EventSource    {"reconciler group": "operator.skywalking.apache.org", "reconciler kind": "UI", "source": "kind source: /, Kind="}
2022-06-07T23:25:50.413+0800    INFO    controller.configmap    Starting Controller     {"reconciler group": "", "reconciler kind": "ConfigMap"}
2022-06-07T23:25:50.413+0800    ERROR   controller.storage      Could not wait for Cache to sync        {"reconciler group": "operator.skywalking.apache.org", "reconciler kind": "Storage", "error": "failed to wait for storage caches to sync: timed out waiting for cache to be synced"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start
        /Users/duncan/gitRepo/skywalking-swck/operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:234
sigs.k8s.io/controller-runtime/pkg/manager.(*controllerManager).startRunnable.func1
        /Users/duncan/gitRepo/skywalking-swck/operator/vendor/sigs.k8s.io/controller-runtime/pkg/manager/internal.go:696
2022-06-07T23:25:50.417+0800    INFO    controller.pod  Starting Controller     {"reconciler group": "", "reconciler kind": "Pod"}
2022-06-07T23:25:50.418+0800    ERROR   controller.configmap    Could not wait for Cache to sync        {"reconciler group": "", "reconciler kind": "ConfigMap", "error": "failed to wait for configmap caches to sync: Timeout: failed waiting for *v1.ConfigMap Informer to sync"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start
        /Users/duncan/gitRepo/skywalking-swck/operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:234
sigs.k8s.io/controller-runtime/pkg/manager.(*controllerManager).startRunnable.func1
        /Users/duncan/gitRepo/skywalking-swck/operator/vendor/sigs.k8s.io/controller-runtime/pkg/manager/internal.go:696
......

do you have any ideas?

Duncan-tree-zhou avatar Jun 07 '22 15:06 Duncan-tree-zhou

Agree with @wu-sheng. Now is a good opportunity to configure the specific value of the optional plugin through a new CR.

@Duncan-tree-zhou There are two ways to debug SWCK operator locally.

  • You can refer the guide to disable the webhook then you can debug the controller locally.
  • If you find it hard to debug the webhook, you can refer the blog to deploy the SWCK and cert-manager in the kind cluster, then you could test your implementation.

dashanji avatar Jun 08 '22 07:06 dashanji

Hi, here is the PR:

https://github.com/apache/skywalking-swck/pull/66

It looks like the PR in SWCK repo cannot be associated to this issue?

here are the changes:

  1. add SwAgent CRD
    • inlcuding label selector and container matcher.
    • the sidecar configs variable is explicitly declared.
    • JavaSideCar.Env will be pass to target pods, in which way use can set agent configs.
  2. update pod in SwAgent Reconcilor is not applicable, so nothing is done in SwAgent Reconciler
  3. SwAgent CRD is take effect in admission webhook, the config precedence is overlay-annotations-config > swAgent-config > swAgent-default-config > overlay-default-config.
  4. for batch updating agents in target pods, two steps to be execute:
    • add or update swAgent CR, and use the correct label selector
    • delete the pods with correct label selector, then k8s will rolling update pods to the corrects.

Duncan-tree-zhou avatar Jun 16 '22 16:06 Duncan-tree-zhou