koperator Add ability to configure cluster wide kafka configurations when using multiple broker groups

Is your feature request related to a problem? Please describe.

Currently kafka-operator supports rack-awareness in the sense that the broker rack is inferred from the underlying node failure domain. It would be useful to also support ability to span brokers in different racks defined upfront.

This can be achieved now by using different brokerConfigGroups, one for each AZ, in combination with nodeAffinity but it has the drawback that the whole kafka broker configs needs to be repeated for each broker group. Hence I am proposing we add a new defaultBrokerConfig field of type BrokerConfig at the cluster.spec level that can be used to define common values reused across all broker groups.

Describe the solution you'd like to see

defaultBrokerConfig:
  kafkaHeapOpts: ...
  kafkaJvmPerfOpts: ....
  storageConfigs: ...
....
brokerConfigGroups:
    rack1:
       nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
            - matchExpressions:
                - key: failure-domain.beta.kubernetes.io/zone
                  operator: In
                  values:
                    - us-east1
    rack2:
       nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
            - matchExpressions:
                - key: failure-domain.beta.kubernetes.io/zone
                  operator: In
                  values:
                    - us-east2
    rack3:
       nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
            - matchExpressions:
                - key: failure-domain.beta.kubernetes.io/zone
                  operator: In
                  values:
                    - us-east3

Describe alternatives you've considered Using different brokerGroups but I have to repeat all kafka broker configuration

Jan 27 '20 11:01 amuraru

Renamed the issue title to better capture the feature request

Feb 02 '20 09:02 amuraru

@baluchicken @stoader what's your thought on adding a new field in KafkaCluster CRD to define clusterDefaultBrokerConfig ? Something similar to clusterImage but for all broker configs?

Feb 02 '20 09:02 amuraru

@amuraru what do you think about relying on Yaml anchors and aliases (https://www.educative.io/blog/advanced-yaml-syntax-cheatsheet#anchors) for composing parts of the yaml from existing parts (to be more in line with yaml standards)?

Something like:

  brokerConfigGroups:
    default: &default-config-group
      storageConfigs: ...
      ...
    rack1:
     parent: *default-config-group
     nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
            - matchExpressions:
                - key: failure-domain.beta.kubernetes.io/zone
                  operator: In
                  values:
                    - us-east1
     rack2:
       parent: *default-config-group
       nodeAffinity:
         requiredDuringSchedulingIgnoredDuringExecution:
           nodeSelectorTerms:
            - matchExpressions:
                - key: failure-domain.beta.kubernetes.io/zone
                  operator: In
                  values:
                    - us-east2

cc @pregnor @hi-im-aren @bartam1 @panyuenlau @amberimam @Kuvesz

Jul 06 '22 14:07 stoader

Haven't heard about this anchor before but that's pretty cool. 👍

Jul 06 '22 14:07 hi-im-aren

It also works with our existing YAML handling libraries it seems, so I would support implementing it this way.

Jul 06 '22 15:07 pregnor

yaml anchors sounds like a neat idea. Do you know if kubectl apply understands though for example?

Sep 04 '22 08:09 amuraru

Based on my quick search and manual test:

Anchors are a YAML 1.0 feature. There is a blog post about K8s supporting YAML anchors in resource descriptors, but according to community, there is a limitation for only supporting in-file anchors, not cross-file ones.

I did a quick manual test on an EKS 1.23 cluster of mine and I can confirm, at least in-file YAML anchors are supported by K8s (I don't think the K8s version is relevant due to the information following). However it is important to note when saving resource descriptors, K8s stores the expanded version of the YAML, not the original one. Because of this, K8s probably doesn't explicitly handle it, it rather lets its YAML parsers deal with parsing and expanding at creation/submission time.

# Manually written yaml_anchor_test_configmap.yaml.
apiVersion: v1
data:
  test1: &content |
    this is a test
  test2: *content
  test3: *content
kind: ConfigMap
metadata:
  name: test

kubectl apply -f ~/yaml_anchor_test_configmap.yaml

# kubectl get configmap test -o yaml
apiVersion: v1
data:
  test1: |
    this is a test
  test2: |
    this is a test
  test3: |
    this is a test
kind: ConfigMap
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"v1","data":{"test1":"this is a test\n","test2":"this is a test\n","test3":"this is a test\n"},"kind":"ConfigMap","metadata":{"annotations":{},"name":"test","namespace":"default"}}
  creationTimestamp: "2022-09-06T09:18:02Z"
  name: test
  namespace: default
  resourceVersion: "591146"
  uid: 7aabfd63-5af5-40e1-9f2e-27712949e77f

Also I expect this to be a supported feature for a long time because a, K8s repo/release notes don't mention support for YAML anchors explicitly, and b, related YAML processing libraries such as go-yaml/yaml (and thus sigs.k8s.io/yaml transitively) are supporting this feature for a long time (5+ years based on past issues). go-yaml/yaml issue from 2015, sigs.k8s.io/yaml using go-yaml/yaml.

Edit: Also seems to work with structured data, not just scalars (I expect it would work for deeply structured data as well, I just don't have a CRD at hand ATM).

apiVersion: v1
data: &structured
  test1: "1"
  test2: "2"
  test3: "3"
kind: ConfigMap
metadata:
  name: test
  annotations: *structured

->

apiVersion: v1
data:
  test1: "1"
  test2: "2"
  test3: "3"
kind: ConfigMap
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"v1","data":{"test1":"1","test2":"2","test3":"3"},"kind":"ConfigMap","metadata":{"annotations":{"test1":"1","test2":"2","test3":"3"},"name":"test","namespace":"default"}}
    test1: "1"
    test2: "2"
    test3: "3"
  creationTimestamp: "2022-09-06T09:51:31Z"
  name: test
  namespace: default
  resourceVersion: "608257"
  uid: 6e0f9c8b-f449-403e-8fee-085d5eaf7e49

Sep 06 '22 09:09 pregnor

koperator koperator copied to clipboard

Add ability to configure cluster wide kafka configurations when using multiple broker groups

koperator
koperator copied to clipboard