crossplane-runtime icon indicating copy to clipboard operation
crossplane-runtime copied to clipboard

Late initialization overrides desired value when it is zero-value

Open muvaf opened this issue 5 years ago • 12 comments

What happened?

We have implemented late-initialization pattern across the resources. However, it relies upon the field's value being nil or zero-value. Example late-init functions from stack-gcp:

// LateInitializeInt64 implements late initialization for int64 type.
func LateInitializeInt64(i *int64, from int64) *int64 {
	if i != nil || from == 0 {
		return i
	}
	return &from
}

// LateInitializeBool implements late initialization for bool type.
func LateInitializeBool(b *bool, from bool) *bool {
	if b != nil || !from {
		return b
	}
	return &from
}

This implementation is buggy because it overrides when the user actually wants the zero-value for that field. For example, the value of field foo *bool is true in both spec and cloud provider and the user wants to make it false. In that case LateInitializeBool will override that value with true because it will assume that zero-value is a case where user doesn't have an opinion about that field and the value from cloud provider will be used. This seems easy to fix for pointer types by converting the function into:

// LateInitializeBool implements late initialization for bool type.
func LateInitializeBool(b *bool, from bool) *bool {
	if b != nil {
		return b
	}
	return &from
}

But there is a catch. We follow Kubernetes pattern for optional fields in our API design, which means that optional fields have the following jsontag:

// +optional
foo *string `json:"foo,omitempty"`

The tag omitempty actually causes the value to be nil in the Go code if assigned value is zero-value, in this case false. So, it comes as nil even if user wanted it to be false. This works for Kubernetes late-inited fields like pod.spec.nodeName because those fields will be eventually filled but that's not the case for us. The Go zero-value for a field could actually be the final value that the user wants for that field.

I'd like us to discuss not using omitempty tag at all, so that zero-values come to the controller as is instead of being converted to nil. In terms of api-server validation, nothing changes as // +optional tag is used to mark the fields as required or not in the CRD. Though not using omitempty is divergence from the optional/required design pattern that Kubernetes suggests.

muvaf avatar Jan 06 '20 12:01 muvaf

/cc @negz @hasheddan

muvaf avatar Jan 06 '20 12:01 muvaf

This is tough because I agree that the solution here is just not using omitempty at all, but that would be a pretty big divergence. However, I feel that the way it is used even in Kubernetes native types is a bit of an anti-pattern. If I understand correctly, our code would not have to change except for literally removing the omitempty tags because we still want to handle nil optional values in the same manner, but right now we are being given nil values in some cases where we do not intend them (i.e. providing a zero-values explicitly to a pointer type). Is this your understanding as well?

hasheddan avatar Jan 06 '20 16:01 hasheddan

It doesn't sound like omitting omitempty goes against conventions:

In most cases, optional fields should also have the omitempty struct tag (the omitempty option specifies that the field should be omitted from the json encoding if the field has an empty value). However, If you want to have different logic for an optional field which is not provided vs. provided with empty values, do not use omitempty (e.g. https://github.com/kubernetes/kubernetes/issues/34641).

negz avatar Jan 25 '20 00:01 negz

@muvaf @negz do we feel good about removing omitempty across the board? If so, I would like to go ahead and move on this while implementing more v1beta1 resources.

hasheddan avatar Jan 28 '20 15:01 hasheddan

@hasheddan Are you confident it won't have any negative side-effects/behaviour changes? If so I'm okay with it.

negz avatar Jan 29 '20 07:01 negz

I think I'll need to do some experiments to make sure we're able to capture all three: nil value, zero value, actual value. Or @hasheddan could also play with it and report here.

muvaf avatar Jan 29 '20 07:01 muvaf

@muvaf yep I'll pull some of our real-world examples and show how the change would impact :+1:

hasheddan avatar Jan 29 '20 22:01 hasheddan

https://github.com/kubernetes-sigs/cluster-api/issues/707

Some more prior art here. Having swapped back in the context of this issue I think we're on the right path. I'm not sure if we want to completely remove omitempty everywhere, but we should remove it for fields for which we want to distinguish between:

  1. Spec author has no opinion, propagate the default value back from the cloud.
  2. Spec author wants the field to be the zero value.
  3. Spec author wants the field to be a non-zero value.

So we'd represent these cases (using an *int field as example) as:

  1. Spec author omits the field, which gets serialized as a nil pointer.
  2. Spec author writes 0, which gets serialised as a pointer to int 0.
  3. Spec author writes 8, which gets serialised as a pointer to int 8.

negz avatar Feb 07 '20 21:02 negz

https://crossplane.slack.com/archives/CKXQHM7U3/p1581113468101800 https://play.golang.org/p/NnWsj9BaTeJ https://play.golang.org/p/oxQgQxtWK62

Cross-linking some discussion from Slack. I actually now think our late init code is ~working as expected~ a little broken when it comes to late-initializing an unspecified spec field from a cloud provider's zero value. 🤔 I believe:

func LateInitializeBool(b *bool, from bool) *bool {
	if b != nil || !from {
		return b
	}
	return &from
}

In the above case b is the user provided value and from is the value we get back from the cloud provider. If we walk through some cases, assuming the following struct:

type Example struct {
    Field *bool `json:"field,omitempty"`
}
  1. The user writes field: false. This is unmarshalled as Field: *false. Therefore b != nil, so we return b, thus keeping the user provided value without ever considering what from (the cloud provider value) was set to.
  2. The user writes field: true. Same as above - it's not nil so we keep it.
  3. The user omits field from their spec. This is unmarshalled as Field: nil, so b == nil. We deem the user to have no opinion about the value of this field and proceed to late initialisation from the cloud provider value.
    1. If the provider value was true, we late initialize Field: *true.
    2. If the provider value was false, we return b, which is nil.

So actually in the case of *bool fields it seems like the alternative logic @muvaf proposed would actually be better:

func LateInitializeBool(b *bool, from bool) *bool {
	if b != nil {
		return b
	}
	return &from
}

With this logic if the cloud provider specified false and the user had no opinion (omitted the field) we'd late initialize the field to *false, rather than nil.

negz avatar Feb 07 '20 23:02 negz

I think the two external issues relating to omitempty that were linked in this issue were red herrings, because they deal with fields that are slices (not pointer to slices).

negz avatar Feb 07 '20 23:02 negz

Another option would be to add a flag to the CRDs that disallow late init stuff? We may need to validate that all late init fields has a value if this new flag is set.

This will allow us to have complete control using the Crossplane CRDs.

mogensen avatar Jan 04 '22 11:01 mogensen

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Aug 13 '22 07:08 stale[bot]

Crossplane does not currently have enough maintainers to address every issue and pull request. This issue has been automatically marked as stale because it has had no activity in the last 90 days. It will be closed in 14 days if no further activity occurs. Leaving a comment starting with /fresh will mark this issue as not stale.

github-actions[bot] avatar Sep 05 '24 01:09 github-actions[bot]