kube-fluentd-operator icon indicating copy to clipboard operation
kube-fluentd-operator copied to clipboard

new pod logstream not getting created with mounted-file option

Open vkadi opened this issue 4 years ago • 51 comments

I am seeing the issue with CW logstreams not getting created when using the "mounted-file" to ingest the logs from file in the container. I picked the sample config from examples folder and able to reproduce it , below is the demo config what I have used.

apiVersion: apps/v1 kind: Deployment metadata: name: nginx-deployment namespace: default spec: selector: matchLabels: app: nginx replicas: 1 template: metadata: labels: app: nginx spec: containers: - image: ubuntu name: test-container command: - bash - -c - while true; do echo date -R [INFO] "Random welcome number $((var++)) to file"; sleep 2; [[ $(($var % 100)) == 0 ]] && :> /var/log/welcome.log ;done > /var/log/welcome.log volumeMounts: - mountPath: /var/log name: logs - mountPath: /host-root name: root volumes: - name: logs emptyDir: {} - name: root hostPath: path: /

Sometime when we refresh the fluent then we will see the logstreams but its not consistent either. Anyone using the "mounted-file" option heavily and saw this issue. Any guidance will be a great help around this area. Note : Using the latest KFO version (1.16.1)

vkadi avatar Nov 15 '21 17:11 vkadi

@Cryptophobia Can you please help with the issue , seems like the new container information is not updated in "fluent.conf" until the fluent is refreshed manually.

vkadi avatar Nov 16 '21 20:11 vkadi

@vkadi , can you share the configMap or CRD fluentdconfig that you are using for the above testing container?

Are you using type @mounted-file? Are you following the README here: https://github.com/vmware/kube-fluentd-operator#i-have-a-legacy-container-that-logs-to-varloghttpdaccesslog

Also it is possible, when your configuration changes the fluentd container needs to be restarted as gracefulReload may not be enough. Let me know if you can provide the detail above and I can test with a similar configmap or CRD.

Cryptophobia avatar Nov 17 '21 16:11 Cryptophobia

@Cryptophobia Yes, I am using the "mounted-file" and this is how my configmap looks for the testing pod -

<source>
   @type mounted-file
   path /var/log/welcome.log
   labels msg= nginx, _container=test-container
</source>
<match **>
   @type cloudwatch_logs
   auto_create_stream true
   log_stream_name_key stream_name
   remove_log_stream_name_key true
   retention_in_days 90
   region "#{ENV['AWS_REGION']}"
   <buffer>
     @type file
     path /var/log/fluentd
     flush_interval 5s
     flush_thread_count 8
     chunk_limit_size 2m
     queued_chunks_limit_size 32
     retry_forever true
   </buffer>
   log_group_name /aws/eks/<cluster_name>/cluster/namespace/default
</match>

vkadi avatar Nov 17 '21 20:11 vkadi

@Cryptophobia I am having a similar problem with mounted-file as @vkadi. I believe the problem isn't in configMap changes, this part works as expected, reloader kicks in and invokes controller::RunOnce() loop and re-generates the proper ConfigMaps in addition to fluent.conf. However, the problem seems to be when the pod in a deployment is replaced/restarted, the reloader seems to be unaware and thus it does not re-generate or update the proper fluent.conf and configmap files again and hence it never executes the mountedfile::Process() and also it won't create the cloudwatch logstream with the new pod name as a result.

Note that fluentd logs does show the new pod such as: 2021-11-17 16:51:15 +0000 [info]: #0 [in_tail_container_logs] following tail of /var/log/containers/nginx-deployment-f697bcfb-8b2jm_default_test-container-a13490af26e0e63a6eeafa884c66b6e7891835fc204fa2c259f5e3105a946496.log 2021-11-17 16:51:40 +0000 [info]: #0 detected rotation of /var/lib/kubelet/pods/601ac67a-0639-4f28-979b-cf21151257bd/volumes/kubernetes.io~empty-dir/logs/welcome.log; waiting 5 seconds 2021-11-17 16:51:41 +0000 [info]: #0 [in_tail_container_logs] detected rotation of /var/log/containers/nginx-deployment-f697bcfb-clqgw_default_test-container-664c91264cbc2b835b9a2ab1260b9c1c6d398f1ff419907cc5b243273b5efa97.log; waiting 5 seconds

sabdalla80 avatar Nov 17 '21 20:11 sabdalla80

Thank you @sabdalla80 , that description helps a lot! I think in this case maybe we are not doing it correctly. Since the config file has not change, we only force fluentd reload when config hashes change. We need to force gracefulReload on fluentd even when configmap does not change but the mountedFile pod is restarted/replaced.

This sounds like the issue here. This also explains why KFO sometimes does not work with job type pods.

Cryptophobia avatar Nov 17 '21 22:11 Cryptophobia

I think this is related to the issues in #253

Cryptophobia avatar Nov 17 '21 22:11 Cryptophobia

Thank you @sabdalla80 , that description helps a lot! I think in this case maybe we are not doing it correctly. Since the config file has not change, we only force fluentd reload when config hashes change. We need to force gracefulReload on fluentd even when configmap does not change but the mountedFile pod is restarted/replaced.

This sounds like the issue here. This also explains why KFO sometimes does not work with job type pods.

@Cryptophobia Thank you for the response and feedback. It sounds like others having same issue, I know this issue is currently impacting me and I can use a quick fix. Do you know if you have an ETA for the fix? I would be glad to assist as I already went down this path of learning the insides of the code and got hung up on how to detect pod changes (I am not 100% sure, but it sounds like it's done in KubeInformer). Please let me know as I can help with testing the fix as well since I already have an environment with this issue ready for testing.

sabdalla80 avatar Nov 18 '21 13:11 sabdalla80

@sabdalla80 , yes I think we need to find some place in kubeInformer.go (config-reloader/datasource/kube_informer.go) where we can do this check via the sharedInformers for the pods. If pods are getting restarted in a namespaces, config-reloader needs to be smart enough to issue new reload to fluentd. This is where we can get a list of pods from namespace config struct that we build.

@sabdalla80 Before we start looking at code fix can you verify if this is not a regression for me? There are a couple lines of code were deleted that were thought to be useless. Can you run the v1.15.0 docker image and verify you are seeing the same problems?

The line deleted in question is this commit: https://github.com/vmware/kube-fluentd-operator/pull/227/files

Cryptophobia avatar Nov 18 '21 17:11 Cryptophobia

@sabdalla80 , yes I think we need to find some place in kubeInformer.go (config-reloader/datasource/kube_informer.go) where we can do this check via the sharedInformers for the pods. If pods are getting restarted in a namespaces, config-reloader needs to be smart enough to issue new reload to fluentd. This is where we can get a list of pods from namespace config struct that we build.

@sabdalla80 Before we start looking at code fix can you verify if this is not a regression for me? There are a couple lines of code were deleted that were thought to be useless. Can you run the v1.15.0 docker image and verify you are seeing the same problems?

The line deleted in question is this commit: https://github.com/vmware/kube-fluentd-operator/pull/227/files

@Cryptophobia I tried it against 1.15.0 and 1.13.0, it has the same issue.

sabdalla80 avatar Nov 18 '21 17:11 sabdalla80

I just looked at the code and I have an idea. We can use an hasher to get a unique has for the nsConfig struct each time RunOnce() loop runs. This will avoid us having to reload each time new pods restart or using kubeInformer at all. We just build namespace representation using our current way.

If the podlist slice representation in nsConfig.MiniContainers changes then the hash value of the whole nsConfig struct will change and we can issue reload.

We can use this for our hasher: https://github.com/mitchellh/hashstructure

EDIT: The true problem stems from the fact that mount-file names change based on the container names, but forcing more reloads will make KFO more resilient and this is the best way to solve the problem I think.

Cryptophobia avatar Nov 18 '21 17:11 Cryptophobia

@sabdalla80, thank you for testing. After I do some local testing, I will build a new test image today.

Cryptophobia avatar Nov 18 '21 20:11 Cryptophobia

@sabdalla80, thank you for testing. After I do some local testing, I will build a new test image today.

This would be spectacular. @Cryptophobia If you can share any code/branch you have started, I can contribute to it if it helps to bring to completion.

sabdalla80 avatar Nov 18 '21 20:11 sabdalla80

@sabdalla80 , When I do a simple test with 3 pods in a deployment. I cannot reproduce this. You are saying the mountedfile:Process() is not being called. Can you be more specific and in which part of the code?

Cryptophobia avatar Nov 19 '21 21:11 Cryptophobia

@Cryptophobia Are you able to see main control loop re-run when you restart/replace a pod in the deployment? Does it re-generate fluentd.conf file?

sabdalla80 avatar Nov 19 '21 21:11 sabdalla80

No, the main control loop does not rerun. But I see fluentd picks up new pods and new container names when the containers restarts/scale-up or whatever. This seems like a race-condition with logstream for CW?

Cryptophobia avatar Nov 19 '21 21:11 Cryptophobia

@sabdalla80 , do you think if we force fluentd to reload this will be fixed? This seems like CW logstreams expect to find new names of pods and it is not being forced because of race condition?

Cryptophobia avatar Nov 19 '21 21:11 Cryptophobia

@Cryptophobia In intial deployment things work fine, the control loop generates fluentd.conf and proper CM files for a service. On pod restart, even though fluentd indeed sees the new pod, the fluend.conf and CM files are not re-generated, they become stale and don't point correctly to the new pods. To answer your question, from the behavior I have seen thus far, I do believe forcing to reload will fix the problem. As a matter of fact, I tested this theory by restarting fluentd pod manually and it seems to update things correctly. Is there a way we can code this and test it? I would have tried it if I have good understanding of the code that is initiating the detection of pod changes.

For example: this is what's generated in fluent.conf file in deployment for mounted-file pod. When mounted-file pod restarts, this file isn't changed.

#################
# Namespace pre-processing
#################
<source>
  @type tail
  path /var/lib/kubelet/pods/05fa8d22-a910-4edc-8004-6a1ca18a252e/volumes/kubernetes.io~empty-dir/logs/welcome.log
  pos_file /var/log/kfotail-23b81aaedcbe0520c93c4396c956c89b8136408d.pos
  read_from_head true
  tag kube.default.nginx-deployment-f697bcfb-ct5k5.test-container-23b81aaedcbe0520c93c4396c956c89b8136408d

  <parse>
    @type none
  </parse>
</source>

<filter kube.default.nginx-deployment-f697bcfb-ct5k5.test-container-23b81aaedcbe0520c93c4396c956c89b8136408d>
  @type record_modifier
  remove_keys dummy_

  <record>
    dummy_ ${record['stream']='/var/log/welcome.log'; record['kubernetes']={'container_image'=>'ubuntu','container_name'=>'test-container','host'=>'ip-10-0-29-130.ec2.internal','namespace_name'=>'default','pod_id'=>'05fa8d22-a910-4edc-8004-6a1ca18a252e','pod_name'=>'nginx-deployment-f697bcfb-ct5k5'}; record['docker']={'container_id'=>'docker://fd034ffb705c6be7a6d726139622955e4d999b2d532a3497d7c97233c62ccbea'}; record['container_info']='3039a456dcf33176ead2f0e1223137d0ac6a2e3e'; record['kubernetes']['labels']={'app'=>'nginx','pod-template-hash'=>'f697bcfb'}; record['kubernetes']['namespace_labels']={}}
  </record>
</filter>

sabdalla80 avatar Nov 19 '21 21:11 sabdalla80

@sabdalla80 , thank you for this. I think I understand the problem. The CW plugin requires that the container name be known ahead of time so even though we make correct fluentd configs from reloader when fluentd does not restart then the CW plugin configuration is not the correct one loaded. I am building a container for you to test this theory. The fix is basically to reload anytime we have new pods. I think this idea will make KFO more robust anyways. We will see in practice if this is desired.

Cryptophobia avatar Nov 19 '21 22:11 Cryptophobia

@sabdalla80 , thank you for this. I think I understand the problem. The CW plugin requires that the container name be known ahead of time so even though we make correct fluentd configs from reloader when fluentd does not restart then the CW plugin configuration is not the correct one loaded. I am building a container for you to test this theory. The fix is basically to reload anytime we have new pods. I think this idea will make KFO more robust anyways. We will see in practice if this is desired.

It would be awesome. You are spot on in your description.

sabdalla80 avatar Nov 19 '21 22:11 sabdalla80

@sabdalla80 , I still can't produce a valid test with the similar preprocessing block as your config. Can you post the fluentd-config you are using for the namespace and the deployment yaml that you are deploying for this mount-file test?

Cryptophobia avatar Nov 19 '21 22:11 Cryptophobia

Can you try testing with this image when you have some time?

vmwaresaas.jfrog.io/vdp-public/upstream/kube-fluentd-operator:sabdalla80-test-1

EDIT: Image is still being pushed up, give it 2 mins.

Cryptophobia avatar Nov 19 '21 22:11 Cryptophobia

@sabdalla80 , I am able to reproduce the preprocessing block now. I see when I scale up from 3 to 6 pods, the preprocessing still stay to 3 blocks...

Cryptophobia avatar Nov 19 '21 22:11 Cryptophobia

@Cryptophobia

with vmwaresaas.jfrog.io/vdp-public/upstream/kube-fluentd-operator:sabdalla80-test-1

I did see reloader log this event this time around on mounted-file pod restart. But, I did not see fluentd.conf file update. Are you not running the whole chain of events such as generator::renderToDisk in your new changes?

time="2021-11-19T22:49:05Z" level=debug msg="Config hashes returned in RunOnce loop: map[abdas1-pega-58:8c3d1db1c72be76ae2299c726880b349ee7ee82d calico-system:7b0343ff7ee57ab27aef853cbfab80c49bcc9643 default:468b29a540388037bedc0ba2c9324d8e0240695b ext-secrets:8fc4450b5f01d9eb4a007f82aa1013df9a56aa1a gloo-system:155cb45bff838838d4514b6680b27a1aae5bec32 istio-system:d345dfc9f351f907c562df9fd0a537d27605b57b k8s-infra-components:6b89ba488815961fc3446e61848d09d5dc5b9f1c k8s-infra-operator:95862c8f02da78a629e30e44a1e8f4093b00a6bd kube-public:c8e51186853ef96e15db4c46ebbe0b6b5e0c4a3f kube-system:8ecffb9f57c8a0c48eeadbc36c9f247d669fc5d2 logging:71cf57b46a837f2b5f0bd616cdf12b2f245d2f0a metacontroller:0d91bcef8e1189aa3d10d0656d471268c039ec03 srs:1bedbb48a6f3eddab391c51c906e6355b4307656 tigera-operator:30b3bcb04653fc02d66c5579a320448010c942c2]"

sabdalla80 avatar Nov 19 '21 23:11 sabdalla80

I think I solved it @sabdalla80 !

By making a function with a channel in the Run() loop (moving it out of RunOnce()) so that it does the check now and does not rely on the KubeInformer update channels, but does it outside of the informers. This was to save replicating the code across all of the different type of informers. It's a hacky solution, but if it works, it will solve all the preprocessing race conditions problems we may have in future too. 😄

Please test when you can:

vmwaresaas.jfrog.io/vdp-public/upstream/kube-fluentd-operator:sabdalla80-test-2

Here was the solution:

	// checkIfNeedsReload will verify if new containers neeed to preprocess
	checkIfNeedsRunOnce := func() (needsRun <-chan bool) {
		needsForceRunOnce := make(chan bool, 1)
		allConfigNamespaces, _ := c.Datasource.GetNamespaces(ctx)
		for _, nsConfig := range allConfigNamespaces {
			// if there are new/restarted pods in the configed namespace, we should reload just in case:
			// mounted-file fix, mounted file path changes based on container names:
			// this may fix race condition in logstream plugins that depend on container names?
			if !util.AreStructureHashEqual(nsConfig.MiniContainers, nsConfig.MiniContainers) {
				needsForceRunOnce <- true
			} else {
				needsForceRunOnce <- false
			}
		}
		return needsForceRunOnce
	}
	```
 

Cryptophobia avatar Nov 20 '21 00:11 Cryptophobia

@sabdalla80 I will wait for your test with container before I make PR:

vmwaresaas.jfrog.io/vdp-public/upstream/kube-fluentd-operator:sabdalla80-test-2

Cryptophobia avatar Nov 20 '21 00:11 Cryptophobia

@sabdalla80 I will wait for your test with container before I make PR:

vmwaresaas.jfrog.io/vdp-public/upstream/kube-fluentd-operator:sabdalla80-test-2

@Cryptophobia I am still seeing the same behavior. It seems as if /fluentd/etc/fluent.conf and the .conf files for other namespaces remain stale when mount-file pods are replaced. Is it possible kubeInformer isn't detecting the replacement change of pods to begin with? The reason I say this is I would have expected to see controller and Generator events to show up in reloader logs when I bounce a pod. Furthermore, I have tried adding log messages in the kubeInformer functions such as discoverNamespaces() and GetNameSpaces() and I did not see them run when mounted_file pod is replaced.

In contrast to configMap::detectConfigMapName() which seems to be listening all the time, the kubeInformer::GetNamespaces() does not seem to be listening. It appears as if KubeInformer needs some listening logic similar to Configmap.go. The latter seems to have updateCh logic and the former doesn't. Could that be the issue?

sabdalla80 avatar Nov 20 '21 00:11 sabdalla80

@sabdalla80 , have you tried running KFO in --datasource=crd mode ? Seems to work for me in the crd mode.

Cryptophobia avatar Nov 20 '21 04:11 Cryptophobia

@sabdalla80 , have you tried running KFO in --datasource=crd mode ? Seems to work for me in the crd mode.

@Cryptophobia No, I have not tried. I suppose I can give it a try, what changes do I need to make?

In your sample fix above where you compare the hash based on containers, I believe that may work if you cause the controller to run on pod/container restarts. The problem currently(from what I have seen) is that whatever event triggers the controller to run is not happening. It seems as if the code changes you made need to happen in the kube_informer and not at the controller?

sabdalla80 avatar Nov 20 '21 19:11 sabdalla80

@sabdalla80 , have you tried running KFO in --datasource=crd mode ? Seems to work for me in the crd mode.

@Cryptophobia I have got a chance to test vmwaresaas.jfrog.io/vdp-public/upstream/kube-fluentd-operator:sabdalla80-test-2 using CRD and I am still seeing the same issue where the Namespace pre-processing block in fluent.conf does not get updated for the new containers in the deployment until I bounce the fluent pod .

vkadi avatar Nov 23 '21 04:11 vkadi

I think I solved it @sabdalla80 !

By making a function with a channel in the Run() loop (moving it out of RunOnce()) so that it does the check now and does not rely on the KubeInformer update channels, but does it outside of the informers. This was to save replicating the code across all of the different type of informers. It's a hacky solution, but if it works, it will solve all the preprocessing race conditions problems we may have in future too. 😄

Please test when you can:

vmwaresaas.jfrog.io/vdp-public/upstream/kube-fluentd-operator:sabdalla80-test-2

Here was the solution:

	// checkIfNeedsReload will verify if new containers neeed to preprocess
	checkIfNeedsRunOnce := func() (needsRun <-chan bool) {
		needsForceRunOnce := make(chan bool, 1)
		allConfigNamespaces, _ := c.Datasource.GetNamespaces(ctx)
		for _, nsConfig := range allConfigNamespaces {
			// if there are new/restarted pods in the configed namespace, we should reload just in case:
			// mounted-file fix, mounted file path changes based on container names:
			// this may fix race condition in logstream plugins that depend on container names?
			if !util.AreStructureHashEqual(nsConfig.MiniContainers, nsConfig.MiniContainers) {
				needsForceRunOnce <- true
			} else {
				needsForceRunOnce <- false
			}
		}
		return needsForceRunOnce
	}

@Cryptophobia if !util.AreStructureHashEqual(nsConfig.MiniContainers, nsConfig.MiniContainers) { needsForceRunOnce <- true } else { needsForceRunOnce <- false } you are intentionally comparing same obj's here? Seems a typo ...

Also, I tried to upscale the pods and with that also I am seeing the same behaviour , can you please brief the steps how your are testing this.

vkadi avatar Nov 23 '21 15:11 vkadi