legacy-kubernetes-app
legacy-kubernetes-app copied to clipboard
snap-kubestate gets recycled on a medium sized cluster
We have an issue that the kubestate pod gets recycled every couple of minutes and and cluster metrics are not being send on a cluster of roughly 30 machines and ~1000 pods.
This is what i see in the log file.
time="2017-10-16T23:20:10Z" level=warning msg="This plugin is using a deprecated RPC protocol. Find more information here: https://github.com/intelsdi-x/snap/issues/1289 " _block=newAvailablePlugin _module=control-aplugin plugin_name=df
time="2017-10-16T23:20:10Z" level=warning msg="This plugin is using a deprecated RPC protocol. Find more information here: https://github.com/intelsdi-x/snap/issues/1289 " _block=newAvailablePlugin _module=control-aplugin plugin_name=iostat
time="2017-10-16T23:20:12Z" level=warning msg="This plugin is using a deprecated RPC protocol. Find more information here: https://github.com/intelsdi-x/snap/issues/1289 " _block=newAvailablePlugin _module=control-aplugin plugin_name=load
time="2017-10-16T23:20:12Z" level=warning msg="Ignoring JSON/Yaml file: core.json" _block=start _module=control autodiscoverpath="/opt/snap/tasks_startup"
time="2017-10-16T23:20:14Z" level=error msg="collector run error" _module=scheduler-job block=run error="rpc error: code = ResourceExhausted desc = grpc: received message larger than max (4657834 vs. 4194304)" job-type=collector
time="2017-10-16T23:20:14Z" level=warning msg="Task failed" _block=spin _module=scheduler-task consecutive failure limit=10 consecutive failures=1 error="rpc error: code = ResourceExhausted desc = grpc: received message larger than max (4657834 vs. 4194304)" task-id=c27715d5-6964-43ca-8eb4-4b656373e38c task-name=Task-c27715d5-6964-43ca-8eb4-4b656373e38c
time="2017-10-16T23:20:24Z" level=error msg="collector run error" _module=scheduler-job block=run error="rpc error: code = ResourceExhausted desc = grpc: received message larger than max (4753318 vs. 4194304)" job-type=collector
time="2017-10-16T23:20:24Z" level=warning msg="Task failed" _block=spin _module=scheduler-task consecutive failure limit=10 consecutive failures=2 error="rpc error: code = ResourceExhausted desc = grpc: received message larger than max (4753318 vs. 4194304)" task-id=c27715d5-6964-43ca-8eb4-4b656373e38c task-name=Task-c27715d5-6964-43ca-8eb4-4b656373e38c
time="2017-10-16T23:20:34Z" level=error msg="collector run error" _module=scheduler-job block=run error="rpc error: code = ResourceExhausted desc = grpc: received message larger than max (4755429 vs. 4194304)" job-type=collector
time="2017-10-16T23:20:34Z" level=warning msg="Task failed" _block=spin _module=scheduler-task consecutive failure limit=10 consecutive failures=3 error="rpc error: code = ResourceExhausted desc = grpc: received message larger than max (4755429 vs. 4194304)" task-id=c27715d5-6964-43ca-8eb4-4b656373e38c task-name=Task-c27715d5-6964-43ca-8eb4-4b656373e38c
time="2017-10-16T23:20:44Z" level=error msg="collector run error" _module=scheduler-job block=run error="rpc error: code = ResourceExhausted desc = grpc: received message larger than max (4756272 vs. 4194304)" job-type=collector
time="2017-10-16T23:20:44Z" level=warning msg="Task failed" _block=spin _module=scheduler-task consecutive failure limit=10 consecutive failures=4 error="rpc error: code = ResourceExhausted desc = grpc: received message larger than max (4756272 vs. 4194304)" task-id=c27715d5-6964-43ca-8eb4-4b656373e38c task-name=Task-c27715d5-6964-43ca-8eb4-4b656373e38c
time="2017-10-16T23:20:54Z" level=error msg="collector run error" _module=scheduler-job block=run error="rpc error: code = ResourceExhausted desc = grpc: received message larger than max (4754734 vs. 4194304)" job-type=collector
time="2017-10-16T23:20:54Z" level=warning msg="Task failed" _block=spin _module=scheduler-task consecutive failure limit=10 consecutive failures=5 error="rpc error: code = ResourceExhausted desc = grpc: received message larger than max (4754734 vs. 4194304)" task-id=c27715d5-6964-43ca-8eb4-4b656373e38c task-name=Task-c27715d5-6964-43ca-8eb4-4b656373e38c
Can not figure a way to configure the max size of the message. Maybe you can shed some light on that ? Thanks
kubectl exec -it snap-kubestate-deployment-3536784749-k0q9s -- /opt/snap/bin/snaptel task list
ID NAME STATE HIT MISS FAIL CREATED LAST FAILURE
6b6dacb3-8b53-458c-9cba-629ade4e7a65 Task-6b6dacb3-8b53-458c-9cba-629ade4e7a65 Running 6 0 6 4:57PM 10-17-2017 rpc error: code = ResourceExhausted desc = grpc: received message larger than max (4283236 vs. 4194304)
We have in running on our dev/qa cluster which is much smaller , and it works there without any problem
There is no way to change the limit unfortunately. We forked snap to get around this and just hacked in a higher limit.
The proper way to fix it would be to send a PR that fixes this issue: https://github.com/intelsdi-x/snap-plugin-lib-go/issues/43
There is a PR https://github.com/intelsdi-x/snap-plugin-lib-go/pull/89
@daniellee Can you point me to the forked repository that you hacked ? Or is it private ?