legacy-kubernetes-app snap-kubestate gets recycled on a medium sized cluster

We have an issue that the kubestate pod gets recycled every couple of minutes and and cluster metrics are not being send on a cluster of roughly 30 machines and ~1000 pods.

This is what i see in the log file.

time="2017-10-16T23:20:10Z" level=warning msg="This plugin is using a deprecated RPC protocol. Find more information here: https://github.com/intelsdi-x/snap/issues/1289 " _block=newAvailablePlugin _module=control-aplugin plugin_name=df
time="2017-10-16T23:20:10Z" level=warning msg="This plugin is using a deprecated RPC protocol. Find more information here: https://github.com/intelsdi-x/snap/issues/1289 " _block=newAvailablePlugin _module=control-aplugin plugin_name=iostat
time="2017-10-16T23:20:12Z" level=warning msg="This plugin is using a deprecated RPC protocol. Find more information here: https://github.com/intelsdi-x/snap/issues/1289 " _block=newAvailablePlugin _module=control-aplugin plugin_name=load
time="2017-10-16T23:20:12Z" level=warning msg="Ignoring JSON/Yaml file: core.json" _block=start _module=control autodiscoverpath="/opt/snap/tasks_startup"
time="2017-10-16T23:20:14Z" level=error msg="collector run error" _module=scheduler-job block=run error="rpc error: code = ResourceExhausted desc = grpc: received message larger than max (4657834 vs. 4194304)" job-type=collector
time="2017-10-16T23:20:14Z" level=warning msg="Task failed" _block=spin _module=scheduler-task consecutive failure limit=10 consecutive failures=1 error="rpc error: code = ResourceExhausted desc = grpc: received message larger than max (4657834 vs. 4194304)" task-id=c27715d5-6964-43ca-8eb4-4b656373e38c task-name=Task-c27715d5-6964-43ca-8eb4-4b656373e38c
time="2017-10-16T23:20:24Z" level=error msg="collector run error" _module=scheduler-job block=run error="rpc error: code = ResourceExhausted desc = grpc: received message larger than max (4753318 vs. 4194304)" job-type=collector
time="2017-10-16T23:20:24Z" level=warning msg="Task failed" _block=spin _module=scheduler-task consecutive failure limit=10 consecutive failures=2 error="rpc error: code = ResourceExhausted desc = grpc: received message larger than max (4753318 vs. 4194304)" task-id=c27715d5-6964-43ca-8eb4-4b656373e38c task-name=Task-c27715d5-6964-43ca-8eb4-4b656373e38c
time="2017-10-16T23:20:34Z" level=error msg="collector run error" _module=scheduler-job block=run error="rpc error: code = ResourceExhausted desc = grpc: received message larger than max (4755429 vs. 4194304)" job-type=collector
time="2017-10-16T23:20:34Z" level=warning msg="Task failed" _block=spin _module=scheduler-task consecutive failure limit=10 consecutive failures=3 error="rpc error: code = ResourceExhausted desc = grpc: received message larger than max (4755429 vs. 4194304)" task-id=c27715d5-6964-43ca-8eb4-4b656373e38c task-name=Task-c27715d5-6964-43ca-8eb4-4b656373e38c
time="2017-10-16T23:20:44Z" level=error msg="collector run error" _module=scheduler-job block=run error="rpc error: code = ResourceExhausted desc = grpc: received message larger than max (4756272 vs. 4194304)" job-type=collector
time="2017-10-16T23:20:44Z" level=warning msg="Task failed" _block=spin _module=scheduler-task consecutive failure limit=10 consecutive failures=4 error="rpc error: code = ResourceExhausted desc = grpc: received message larger than max (4756272 vs. 4194304)" task-id=c27715d5-6964-43ca-8eb4-4b656373e38c task-name=Task-c27715d5-6964-43ca-8eb4-4b656373e38c
time="2017-10-16T23:20:54Z" level=error msg="collector run error" _module=scheduler-job block=run error="rpc error: code = ResourceExhausted desc = grpc: received message larger than max (4754734 vs. 4194304)" job-type=collector
time="2017-10-16T23:20:54Z" level=warning msg="Task failed" _block=spin _module=scheduler-task consecutive failure limit=10 consecutive failures=5 error="rpc error: code = ResourceExhausted desc = grpc: received message larger than max (4754734 vs. 4194304)" task-id=c27715d5-6964-43ca-8eb4-4b656373e38c task-name=Task-c27715d5-6964-43ca-8eb4-4b656373e38c

Can not figure a way to configure the max size of the message. Maybe you can shed some light on that ? Thanks

kubectl exec -it snap-kubestate-deployment-3536784749-k0q9s -- /opt/snap/bin/snaptel task list
ID 					 NAME 						 STATE 		 HIT 	 MISS 	 FAIL 	 CREATED 		 LAST FAILURE
6b6dacb3-8b53-458c-9cba-629ade4e7a65 	 Task-6b6dacb3-8b53-458c-9cba-629ade4e7a65 	 Running 	 6 	 0 	 6 	 4:57PM 10-17-2017 	 rpc error: code = ResourceExhausted desc = grpc: received message larger than max (4283236 vs. 4194304)

We have in running on our dev/qa cluster which is much smaller , and it works there without any problem

Oct 17 '17 16:10 romankor

There is no way to change the limit unfortunately. We forked snap to get around this and just hacked in a higher limit.

The proper way to fix it would be to send a PR that fixes this issue: https://github.com/intelsdi-x/snap-plugin-lib-go/issues/43

Oct 19 '17 14:10 daniellee

There is a PR https://github.com/intelsdi-x/snap-plugin-lib-go/pull/89

Oct 19 '17 14:10 DanCech

@daniellee Can you point me to the forked repository that you hacked ? Or is it private ?

Oct 19 '17 20:10 romankor

legacy-kubernetes-app legacy-kubernetes-app copied to clipboard

snap-kubestate gets recycled on a medium sized cluster

legacy-kubernetes-app
legacy-kubernetes-app copied to clipboard