fluentd-kubernetes-sumologic icon indicating copy to clipboard operation
fluentd-kubernetes-sumologic copied to clipboard

Sourcecategories in Sumo appending numbers

Open jeffwroblewski opened this issue 6 years ago • 12 comments

For 2.1 and beyond, user is seeing an issue where sourecategories are populating like this:

app_prod/some_app_name app_prod/some_app_name_62 app_prod/some_app_name 63...

Can provide more info offline as needed.

Thanks! Jeff W. TAM, Sumo

jeffwroblewski avatar Feb 13 '19 15:02 jeffwroblewski

cc @bendrucker : ill look into this as soon as I can, but seems likely related to the fix for #78

frankreno avatar Feb 13 '19 15:02 frankreno

What version of Kubernetes are you running?

bendrucker avatar Feb 13 '19 20:02 bendrucker

We're running OpenShift v3.3 which includes Kubernetes v1.3.

andrews32 avatar Feb 13 '19 21:02 andrews32

Gotcha, seems like there's probably no test coverage for that pod name format anymore. I can look into it in a few.

bendrucker avatar Feb 13 '19 21:02 bendrucker

We are in the middle of upgrading to OpenShift v3.9 which includes Kubernetes v1.9, but the symbolic links where I believe the code retrieves the pod name from is the same format as in K8S v1.3.

For example, the first one is docker-registry-2-mqe0f... Where docker-registry is the pod_name, 2 is the deployment config counter, and mqe0f is the hash.

The problem is its inconsistent on what is retrieves as the _sourceCategory. Sometimes it'll be "docker-registry", sometimes it'll be "docker-registry-2".

[root@infra01-devtest-vxbyr ~]# ls /var/log/containers/ docker-registry-2-mqe0f_default_POD-9171d6915e911a532fb6048191e9713ed36a14ccd1a9057624ece298f08b350a.log docker-registry-2-mqe0f_default_registry-6c077b90f9592770d1e63b0444551d331167551e035f3d25b5922d0b4ec05325.log hawkular-cassandra-1-bd5m3_openshift-infra_hawkular-cassandra-1-86f7ee9fddbff12f935a88b0b54e7b46e82a73657000ff41b209076ed7fcc657.log hawkular-cassandra-1-bd5m3_openshift-infra_POD-a6d27b85d71c4e0c56e600b8d3666e39da8d360515c75d20282318b56f50be47.log hawkular-metrics-5ovsm_openshift-infra_hawkular-metrics-e70c33b6df41717ad12ccfc1d55b462603ac6a79adbae45c7cad0d363bfccd74.log hawkular-metrics-5ovsm_openshift-infra_POD-02596538d07c076a7c2447bd364981296ed2584aea29e1158eca643d78359953.log registry-console-1-6iu26_default_POD-7ce336ef1bb0442e3d93c642c7b63c523e23adf924e4ed1f4f26bd7db6e17c64.log registry-console-1-6iu26_default_registry-console-a3b0ad6a98b33001ef205bd1bb83d027d19d67013f701d66ec05183314a04e3c.log router-25-2gpi9_default_POD-ec02bb74c544cc201fea714ce52b3f7bef9adb7d6c47b03e7b63df4cb8df6819.log router-25-2gpi9_default_router-6cea4aef80d33705c275d19392de0608accec18243e7ef2a7a773103735ee510.log

andrews32 avatar Feb 13 '19 22:02 andrews32

So the actual pod name is docker-registry-2-mqe0f if I'm reading right? Would be a huge help if you could get an entire pod (kubectl get pod <name> -o yaml) for confirmation.

bendrucker avatar Feb 13 '19 22:02 bendrucker

I suspect the regression that's affecting you from #78 has to do with the pod template hash. Rather than hardcode error-prone patterns based on string formatting (i.e. strip this part if it's numbers), we switched to actually detecting the pod template hash and deterministically stripping the dynamic parts. I'm trying to get a 1.3 cluster up on minikube but in case that doesn't seem viable so a full pod from your cluster would be helpful.

bendrucker avatar Feb 13 '19 22:02 bendrucker

`[svc-vxby-ose@master01-devtest-vxbyr ~]$ oc get po docker-registry-3-9n3bx -o yaml apiVersion: v1 kind: Pod metadata: annotations: kubernetes.io/created-by: | {"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicationController","namespace":"default","name":"docker-registry-3","uid":"3bf3e47f-fad7-11e8-8df7-005056848c95","apiVersion":"v1","resourceVersion":"1205030190"}} openshift.io/deployment-config.latest-version: "3" openshift.io/deployment-config.name: docker-registry openshift.io/deployment.name: docker-registry-3 openshift.io/scc: restricted creationTimestamp: 2018-12-08T10:52:07Z generateName: docker-registry-3- labels: deployment: docker-registry-3 deploymentconfig: docker-registry docker-registry: default name: docker-registry-3-9n3bx namespace: default resourceVersion: "1205031022" selfLink: /api/v1/namespaces/default/pods/docker-registry-3-9n3bx uid: 4e5362ec-fad7-11e8-8df7-005056848c95 spec: containers:

  • env:
    • name: REGISTRY_HTTP_ADDR value: :5000
    • name: REGISTRY_HTTP_NET value: tcp
    • name: REGISTRY_HTTP_SECRET value: vN4FVfWmHghp7shKhjZadrA6HLg+9FAPqEORak7+VFQ=
    • name: REGISTRY_MIDDLEWARE_REPOSITORY_OPENSHIFT_ENFORCEQUOTA value: "false"
    • name: REGISTRY_HTTP_TLS_CERTIFICATE value: /etc/secrets/registry.crt
    • name: REGISTRY_HTTP_TLS_KEY value: /etc/secrets/registry.key image: openshift3/ose-docker-registry:v3.3.1.46.45 imagePullPolicy: IfNotPresent livenessProbe: failureThreshold: 3 httpGet: path: /healthz port: 5000 scheme: HTTPS initialDelaySeconds: 10 periodSeconds: 10 successThreshold: 1 timeoutSeconds: 5 name: registry ports:
    • containerPort: 5000 protocol: TCP readinessProbe: failureThreshold: 3 httpGet: path: /healthz port: 5000 scheme: HTTPS periodSeconds: 10 successThreshold: 1 timeoutSeconds: 5 resources: requests: cpu: 100m memory: 256Mi securityContext: capabilities: drop:
      • KILL
      • MKNOD
      • SETGID
      • SETUID
      • SYS_CHROOT privileged: false runAsUser: 1000000000 seLinuxOptions: level: s0:c1,c0 terminationMessagePath: /dev/termination-log volumeMounts:
    • mountPath: /registry name: registry-storage
    • mountPath: /etc/secrets name: volume-zx4oi
    • mountPath: /var/run/secrets/kubernetes.io/serviceaccount name: registry-token-sbvmg readOnly: true dnsPolicy: ClusterFirst host: infra04-devtest-vxbyr.xxxx.com imagePullSecrets:
  • name: registry-dockercfg-8xfow nodeName: infra04-devtest-vxbyr.xxxx.com nodeSelector: region: infra restartPolicy: Always securityContext: fsGroup: 1000000000 seLinuxOptions: level: s0:c1,c0 serviceAccount: registry serviceAccountName: registry terminationGracePeriodSeconds: 30 volumes:
  • emptyDir: {} name: registry-storage
  • name: volume-zx4oi secret: secretName: registry-certificates
  • name: registry-token-sbvmg secret: secretName: registry-token-sbvmg status: conditions:
  • lastProbeTime: null lastTransitionTime: 2018-12-08T10:52:07Z status: "True" type: Initialized
  • lastProbeTime: null lastTransitionTime: 2018-12-08T10:52:37Z status: "True" type: Ready
  • lastProbeTime: null lastTransitionTime: 2018-12-08T10:52:07Z status: "True" type: PodScheduled containerStatuses:
  • containerID: docker://a1ce41ffcfcebd69e7f8887493db8e0e7636467862e62a63f9c6823f996fef2a image: openshift3/ose-docker-registry:v3.3.1.46.45 imageID: docker-pullable://registry.access.redhat.com/openshift3/ose-docker-registry@sha256:7b429aa43daf2a2d63c968f685a1c42481055fb14dd68678467f8d0de94d89eb lastState: {} name: registry ready: true restartCount: 0 state: running: startedAt: 2018-12-08T10:52:28Z hostIP: 10.224.210.31 phase: Running podIP: 10.221.76.3 startTime: 2018-12-08T10:52:07Z `

andrews32 avatar Feb 14 '19 19:02 andrews32

From what you posted, the deployment name is docker-registry-3. This repo was meant to remove random sections included from Deployments/ReplicaSets, not necessarily any numeric ID. Seems like it was a bug that it matched/deleted part of your deployment name from the pod_name. You could consider using the open shift labels directly for your source categories.

bendrucker avatar Feb 14 '19 19:02 bendrucker

I'm guessing deployment name wasn't always where the _sourceCategory got his values from. This is new behavior.

Also, as Frank mentiond above, #78 was only fixed/closed in December 2018 which matches the first reports of this new behavior.

What changed in #78 and why? How do we undo it without manually using an old version that will not be maintained?

andrews32 avatar Feb 19 '19 16:02 andrews32

I'm guessing deployment name wasn't always where the _sourceCategory got his values from.

I don't see any reason to assume that

#78 was closed by #100. #78 identified bugs in the original naive implementation of replica pod sanitization. The original implementation would remove the second to last segment of the pod name if it were a number. This is unnecessarily naive.

This numeric value was the pod template hash which is included as a label on the pods. In later versions of k8s, that numeric value was mapped to an alphanumeric encoding, breaking the naive name sanitization. #100 takes the template hash, looks for the numeric or alphanumeric version in the pod name, and removes that segment by exact match. Anything else you do to your pods, including numbers, is left behind.

This feature was meant to target Kubernetes ReplicaSets and this plugin was stripping bits of your pod name due to a bug. It sucks, but sometimes bug fixes are breaking changes if you were depending on buggy behavior.

I made some suggestions above on how to provide a specific metadata template with labels—that would let you define conventions that match your stack. I don't think it would be a good idea to re-introduce behavior that parses pod name conventions outside of what's present in k8s core.

bendrucker avatar Feb 19 '19 18:02 bendrucker

I was looking for something else and came across this ticket and noticed it was still open. I think we can close this now. Thanks for explaining it @bendrucker.

andrews32 avatar Jul 26 '19 05:07 andrews32