oam-kubernetes-runtime icon indicating copy to clipboard operation
oam-kubernetes-runtime copied to clipboard

Trait apply should use update instead of patch

Open hongchaodeng opened this issue 5 years ago • 27 comments

Problem Statement

Currently, when oam-runtime applies traits, it does MergePatch (code link1 and link2). Basically, it compares the trait to apply (A), and existing trait object in cache (B), and compare difference between A and B to create a patch to send to APIServer.

This will lead to the following problem:

  1. Trait initialized as:
auto: false
rules: [a, b, c]
  1. Trait updated as:
auto: true

In above scenario, we expect rules field to be removed, but when we apply it, the trait actually becomes:

auto: true
ruels: [a, b, c]

Proposed Solution

To solve the problem, we should just update the spec of trait instead of patch here. For fields safety concerns, the traits are operational data so it will only be managed by OAM platform. Therefore updating is OK in this case. On the contrary, workloads could be changed by end users and we should consider more about how to update it.

hongchaodeng avatar Oct 28 '20 04:10 hongchaodeng

Maybe I can help, could you plz assign this issue to me?

captainroy-hy avatar Oct 31 '20 05:10 captainroy-hy

@captainroy-hy sure, thanks!

wonderflow avatar Oct 31 '20 05:10 wonderflow

I wonder why patch can't do "remove"?

ryanzhang-oss avatar Oct 31 '20 09:10 ryanzhang-oss

@ryanzhang-oss 'patch' will not remove un-related fields

wonderflow avatar Nov 01 '20 01:11 wonderflow

use APIUpdatingApplicator instead of APIPatchingApplicator https://github.com/crossplane/crossplane-runtime/blob/v0.10.0/pkg/resource/api.go#L178-L208

crossplane-runtime 0.9.0 -> 0.10.0

sak0 avatar Nov 03 '20 14:11 sak0

use APIUpdatingApplicator instead of APIPatchingApplicator https://github.com/crossplane/crossplane-runtime/blob/v0.10.0/pkg/resource/api.go#L178-L208

crossplane-runtime 0.9.0 -> 0.10.0

thx for your kindly suggestion!

captainroy-hy avatar Nov 06 '20 08:11 captainroy-hy

ref https://github.com/oam-dev/kubevela/issues/876

hongchaodeng avatar Jan 12 '21 03:01 hongchaodeng

just to be clear, the real problem is not just to remove the fields that are not present in the new CR but also preserve the fields that are not controlled by the trait.

If this is the real goal then I don't think anyone can solve this unless the trait declares what fields it owns.

ryanzhang-oss avatar Jan 12 '21 03:01 ryanzhang-oss

@ryanzhang-oss Right.

Let me add another example:

# user applies
auto: false
rules: [a, b, c]

# controller adds `others`
auto: false
rules: [a, b, c]
others: "abc"

# user applies again
auto: true

# user wants to see
auto: true
others: "abc"
```

hongchaodeng avatar Jan 12 '21 04:01 hongchaodeng

Per our discussion, this is a well known two-way merge problem: https://medium.com/flant-com/3-way-merge-patches-helm-werf-beb7eccecdfe

The correct way to solve this problem is via three-way merge, which is what kubectl apply does. Basically, we need to emulate kubectl apply in the Apply() function -- even though it is called Apply(), it is not yet.

Nowadays k8s supports server side apply which drastically simplifies the client side implementation. Here are some materials I could find related but they only provide examples dispersedly:

  • https://github.com/kubernetes/kubernetes/issues/88091
  • https://godoc.org/sigs.k8s.io/controller-runtime/pkg/client
  • https://ymmt2005.hatenablog.com/entry/2020/04/14/An_example_of_using_dynamic_client_of_k8s.io/client-go#Using-the-dynamic-client-to-implement-SSA
  • https://medium.com/swlh/break-down-kubernetes-server-side-apply-5d59f6a14e26

hongchaodeng avatar Jan 12 '21 05:01 hongchaodeng

Server-side apply can solve this problem after k8s v1.19. Before 1.19, it doesn't remove field in such a case shown here

captainroy-hy avatar Jan 12 '21 06:01 captainroy-hy

Before 1.19, it doesn't remove field in such a case shown here

I don't get what the comment said. Could you elaborate more detailedly? And provide an example?

hongchaodeng avatar Jan 12 '21 18:01 hongchaodeng

@ryanzhang-oss Right.

Let me add another example:

# user applies
auto: false
rules: [a, b, c]

# controller adds `others`
auto: false
rules: [a, b, c]
others: "abc"

# user applies again
auto: true

# user wants to see
auto: true
others: "abc"

about this case:

because this resource already exists, so if user want remove rules user should send the following request,

# user applies again
auto: true
rules: null

if from patch to update

# user update again
auto: true
# result
auto: true

this is the semantic of update

update or patch depends on the business logic of traits.

Is there no uniform standard ?

allenhaozi avatar Jan 13 '21 01:01 allenhaozi

SS apply this deployment

apiVersion: apps/v1
kind: Deployment
...
spec:
  replicas: 3
  template:
    spec:
      containers:
      - name: nginx
        image: nginx:1.14.2

SS apply below with the same field manager

apiVersion: apps/v1
kind: Deployment
...
spec:
  # replicas: 3 <===== unset
  template:
    spec:
      containers:
      - name: nginx
        # image: nginx:1.14.2 <===== unset

In api-server v1.19, we will get

apiVersion: apps/v1
kind: Deployment
...
spec:
  replicas: 1 # <===== reset to default value
  template:
    spec:
      containers:
      - name: nginx
        # image: nginx:1.14.2 # <===== removed because it has no default value

But in api-server v1.16-v1.18, we will get

apiVersion: apps/v1
kind: Deployment
...
spec:
  replicas: 3 # <===== unchanged
  template:
    spec:
      containers:
      - name: nginx
        image: nginx:1.14.2 # <===== unchanged

Related to this fix merged into k8s v1.19

captainroy-hy avatar Jan 13 '21 02:01 captainroy-hy

Is there no uniform standard ?

@allenhaozi This is what three-way merge patch handles. This is how kubectl apply is implemented. We are going to align with upstream to fix the problem correctly.

hongchaodeng avatar Jan 13 '21 02:01 hongchaodeng

@captainroy-hy I see. If this is well known upstream issue, we don't need to care. All we need to do is to implement kubectl apply on client side. On server side, it is the users who choose what versions (and thus what behaviors) they want.

hongchaodeng avatar Jan 13 '21 02:01 hongchaodeng

@ryanzhang-oss Right.

Let me add another example:

# user applies
auto: false
rules: [a, b, c]

# controller adds `others`
auto: false
rules: [a, b, c]
others: "abc"

# user applies again
auto: true

# user wants to see
auto: true
others: "abc"

I don't think any patch method can solve this problem without what @allenhaozi described. The fact that the trait owns "auto" and "rules" but not "others" field has to be explicitly stated somewhere. This is not k8s related.

ryanzhang-oss avatar Jan 13 '21 03:01 ryanzhang-oss

@ryanzhang-oss

I don't think any patch method can solve this problem without what @allenhaozi described.

That's wrong assumption. I believe this is what kubectl apply address.

hongchaodeng avatar Jan 13 '21 08:01 hongchaodeng

working on this PR, refactor apply mechanism to compute a three-way diff in client side, just like how kubectl apply does under the hood, the last-applied-state is stored in annotation.

captainroy-hy avatar Jan 18 '21 04:01 captainroy-hy

Some related issue w.r.t. client side implementation of kubectl apply:

  • https://github.com/kubernetes/kubernetes/issues/15894
  • https://github.com/kubernetes/kubernetes/issues/17333
  • https://github.com/kubernetes/enhancements/blob/master/keps/sig-api-machinery/555-server-side-apply/README.md#motivation

Note that this is an issue with kubectl since it is controlled by users directly. We can avoid and control the risk by isolating it from end users.

For now we can implement 3 way merge on client side by using our own keyword in annotation app.oam.dev/last-applied.

Once k8s 1.19 is more popular, we can switch the implementation to make use of server side apply.

hongchaodeng avatar Jan 20 '21 03:01 hongchaodeng

I met a problem that when restart oam-addon-sage causes Pods to restart Pods belongs to the deployment which is controlled by ServerWorkload

oam-addon-sage contains ServerWorkload,TaskWorkload,ManualScaler,...

https://github.com/kubernetes-sigs/controller-runtime/issues/1311

allenhaozi avatar Jan 21 '21 06:01 allenhaozi

I met a problem that when restart oam-addon-sage causes Pods to restart Pods belongs to the deployment which is controlled by ServerWorkload

@allenhaozi Could you clarify the problem in more details? And why it is related here or maybe create another issue?

hongchaodeng avatar Jan 21 '21 19:01 hongchaodeng

In simple word two controllers that control one resource based on the implementation mechanism It's difficult to patch the resource

workload render the deployment

when restart the controllers will leads to the resource's pod restart

For example: ServerWorkload create a deployment and set replicas to 0 ManualScalerTrait patch the replicas of this deployment from 0 to 3

when restart the controllers

if the deployment has exist we will find the following event

Events:
  Type    Reason             Age                  From                   Message
  ----    ------             ----                 ----                   -------
  Normal  ScalingReplicaSet  13s (x12 over 5h5m)  deployment-controller  Scaled down replica set simple-web-33-5d759bd4cf to 0
  Normal  ScalingReplicaSet  12s (x15 over 5h7m)  deployment-controller  Scaled up replica set simple-web-33-5d759bd4cf to 3

It's a problem of how can Workload and Trait work together to update a resource when can not patch the resource, so mention it.

allenhaozi avatar Jan 22 '21 00:01 allenhaozi

@allenhaozi A possible workaround: maybe the ServerWorkload controller should show no opinion on .spec.replicas when create/reconcile a Deployment, leaving .spec.replicas to ManualScalerTrait, just like how ContainerizedWorkload controller does.

But of course, it's still a problem to deal with "field management by multiple controllers", as @ryanzhang-oss said in this issue.

captainroy-hy avatar Jan 22 '21 01:01 captainroy-hy

Fixed in https://github.com/oam-dev/kubevela/pull/857 Any plan to port it to this repo too?

hongchaodeng avatar Jan 22 '21 21:01 hongchaodeng

Hello! how can I help?? I am new here but very curious to learn and explore this, can anyone guide me with what i should start ?

Ananya-1106 avatar Mar 19 '21 20:03 Ananya-1106

Hi, @Ananya-1106 Welcome! This repo is a more like libariry of OAM implementation, we suggest you to get start from KubeVela and the oam-kubernetes-runtime lib was contained there.

wonderflow avatar Mar 19 '21 22:03 wonderflow