fluent-operator icon indicating copy to clipboard operation
fluent-operator copied to clipboard

Incorrect Value for KeyDoesNotExist

Open frankgreco opened this issue 3 years ago • 21 comments

Describe the bug

Condition.KeyDoesNotExist takes map[string]string. However, the docs only have one param. I think this should be similar to KeyExists?

Is this a bug? If so I'll submit a PR. If not, what is the proper value for the map value? Also, can I work around it by setting the value to the empty string?

Note there are tests asserting what I believe to be the wrong thing here. Or at least they contradict the FluentBit docs.

To Reproduce

n/a

Expected behavior

n/a

Your Environment

- Fluent Operator version: master
- Container Runtime: n/a
- Operating system: n/a
- Kernel version: n/a

How did you install fluent operator?

No response

What happened?

No response

Your Error Log

n/a

Additional context

No response

frankgreco avatar Jun 02 '22 19:06 frankgreco

@frankgreco Thanks for reporting this, and we'll take a look at this

benjaminhuo avatar Jun 05 '22 12:06 benjaminhuo

This is because fluentbit 1.7 and earlier versions had two values, with the upgrade of fluentbit the value of this parameter is only one value. You can see here.

wenchajun avatar Jun 06 '22 03:06 wenchajun

@wenchajun It makes sense then for backwards compatibility. However, anyone that is isn't familiar with FluentBit v1.7 or earlier (which I assume is most people) will be confused. Perhaps there could be a KeyDoesNotExistLegacy or something that makes it less confusing (I guess this would be a breaking change so not ideal)? At a minimum a code comment describing what to do for earlier versions (empty string value).

Also, what is the end of life timeline for FluentBit versions. When will 1.7 be EOL?

frankgreco avatar Jun 06 '22 16:06 frankgreco

@wenchajun what do you think?

frankgreco avatar Jun 08 '22 22:06 frankgreco

1.7 and 1.8 are already EOL, I like the suggestion of KeyDoesNotExistLegacy or an alternative. Do we need to raise in Fluent Bit main repo?

agup006 avatar Jun 08 '22 23:06 agup006

1.7 and 1.8 are already EOL, I like the suggestion of KeyDoesNotExistLegacy or an alternative. Do we need to raise in Fluent Bit main repo?

The Condition.KeyDoesNotExist has been changed since v1.8. I think it makes sense to change it accordingly in fluent operator because now fluent operator supports v1.8+ and we're going to release a new version to support v1.9.

We needn't to KeyDoesNotExistLegacy for now because 1.7 is EOL. @frankgreco would you create a PR for this?

benjaminhuo avatar Jun 09 '22 01:06 benjaminhuo

Yes. Btw I think there are a few other instances of this. We can apply the same fix as they are found. Btw, since this is a breaking change. I'll let you guys figure out how you want to handle the semver

frankgreco avatar Jun 09 '22 01:06 frankgreco

We're going to release 1.10

benjaminhuo avatar Jun 09 '22 02:06 benjaminhuo

I meant breaking change for this repo, the operator (breaking change for the CRD types). We should probs audit the rest of the process in apis/fluentbit where this same issues applies and fix them before bumping the version.

For example, if someone running fluent operatorv<before this change> they cannot update to fluent operator v<after this change> unless they update the CRDS.

frankgreco avatar Jun 09 '22 02:06 frankgreco

@frankgreco we might need to add CRD conversion for changes like this https://book.kubebuilder.io/multiversion-tutorial/conversion-concepts.html https://book.kubebuilder.io/multiversion-tutorial/conversion.html if there're quite a lot changes like this.

An example is here https://github.com/OpenFunction/OpenFunction/blob/main/apis/core/v1alpha2/function_webhook.go https://github.com/OpenFunction/OpenFunction/blob/main/apis/core/v1alpha2/function_conversion.go

I mean we may introduce a new CRD version v1beta1 and support conversion between v1beta1 and current v1alpha2 if there're quite a few changes like this

If there're not many changes like this, yes we can ask users to update CRDs together with images with a helm upgrade command

benjaminhuo avatar Jun 09 '22 03:06 benjaminhuo

@benjaminhuo don't we need to decide on what the plan is to your comment before that PR is merged? Anyone that pulls latest will now run into errors and must upgrade all CRDs using that filter.

I mean we may introduce a new CRD version v1beta1 and support conversion between v1beta1 and current v1alpha2 if there're quite a few changes like this

I think we need to go with this change. Regardless of whether or not there are quite a few changes like this, I think the disruption just this change alone causes is bad.

I think we should track v1beta1 or v1alpha3 and aggregate a batch of changes to go in.

frankgreco avatar Jun 09 '22 04:06 frankgreco

I think each version update will add or modify a part of the plug-in (including some bugfix). So each upgrade will result in a redeployment of crd. And with fluentbit 1.7 eol, lower versions of fluentbit are no longer supported.

wenchajun avatar Jun 09 '22 08:06 wenchajun

@frankgreco the current k8s log processing pipeline doesn't use this functionality of the modify filter plugin and we haven't received an issue-reporting of his bug(The current FB version is v1.8.x+, which is incompatible with the old modify filter setting). Furthermore, v1.7.x has reached its EOL and the latest operator release will ship with recent FB version

So we may not upgrade the CRD version this time.

benjaminhuo avatar Jun 09 '22 11:06 benjaminhuo

With this change being merged, the following will happen (please correct me if I am wrong)

1. User has existing CRDs using this plugin
2. User updates the version of the FluentBit Operator
3. The Operator throws errors saying it can't unmarshal the CRD
4. User must take unexpected manual action because of this breaking change

How do we think this is acceptable? Users, including myself are running this in production. Why are we so eager to push this breaking change through when there is a viable workaround.

In my strong opinion, we should not be so eager to push through a breaking change when there is a workaround. Rather, we should back these changes up in some branch representing the next CRD version and push them on some longer schedule.

But basically what we're doing here is making breaking changes and giving end users pain.

frankgreco avatar Jun 09 '22 14:06 frankgreco

OK, I understood your concern now. Let's revert this PR and prepare to upgrade the CRD version and add CRD conversion to it.

benjaminhuo avatar Jun 09 '22 14:06 benjaminhuo

The PR has been reverted and let's reopen this issue to continue to work on it.

benjaminhuo avatar Jun 09 '22 14:06 benjaminhuo

Thanks so much! My recommendation is that we use the next CRD version for this v1alpha3 (I think). However, I think we batch up some more changes into that version. Some questions then would be

- let's groom through the types and fix all instances where we have 1.7 types that don't need them anymore
- are there other changes that you guys want to introduce if we use a new crd version?

frankgreco avatar Jun 09 '22 17:06 frankgreco

We want to include your previous PR because the current filter plugin setting KeyDoesNotExist is still for fluent bit v1.7. Although the default image shipped with the operator is already v1.8.

We might release v1alpha3 CRDs maybe after adding the custom plugin CRD mentioned in https://github.com/fluent/fluent-operator/issues/301#issuecomment-1152362843

benjaminhuo avatar Jun 10 '22 14:06 benjaminhuo

We might release v1alpha3 CRDs maybe after adding the custom plugin CRD mentioned in https://github.com/fluent/fluent-operator/issues/301#issuecomment-1152362843

Perhaps then to get started we can do this

- you create a branch called v1alpha3 (I don't believe I have access)
- we merge PRs into that branch until we'll ready to release it
- I'll reopen my PR against that branch

frankgreco avatar Jun 10 '22 19:06 frankgreco

https://github.com/fluent/fluent-operator/tree/v1alpha3 @frankgreco v1alpha3 branch created

benjaminhuo avatar Jun 13 '22 10:06 benjaminhuo

@benjaminhuo Thank you! I'm at Snowflake Summit this week but I'll PR against that branch (along with all of the new v1alpha3 boilerplate) when I get a chance.

frankgreco avatar Jun 14 '22 17:06 frankgreco