krane
krane copied to clipboard
Krane fails with stack trace on some deployments
Bug report
krane fails with a deadlock during some deploys:
Traceback (most recent call last):
--
| 14: from /usr/local/bin/kubernetes-deploy:23:in `<main>'
| 13: from /usr/local/bin/kubernetes-deploy:23:in `load'
| 12: from /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/exe/kubernetes-deploy:71:in `<top (required)>'
| 11: from /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/options_helper.rb:28:in `with_processed_template_paths'
| 10: from /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/exe/kubernetes-deploy:84:in `block in <top (required)>'
| 9: from /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/deploy_task.rb:187:in `run!'
| 8: from /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/statsd.rb:55:in `block in measure_method'
| 7: from /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/deploy_task.rb:451:in `deploy_all_resources'
| 6: from /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/deploy_task.rb:446:in `deploy_resources'
| 5: from /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/resource_watcher.rb:34:in `run'
| 4: from /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/statsd.rb:55:in `block in measure_method'
| 3: from /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/resource_watcher.rb:56:in `sync_resources'
| 2: from /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/concurrency.rb:15:in `split_across_threads'
| 1: from /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/concurrency.rb:15:in `each'
| /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/concurrency.rb:15:in `join': No live threads left. Deadlock? (fatal)
| 2 threads, 2 sleeps current:0x000055b01a322530 main thread:0x000055b018dc3570
| * #<Thread:0x000055b018e24b68 sleep_forever>
| rb_thread_t:0x000055b018dc3570 native:0x00007faaf316a740 int:1
| /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/concurrency.rb:15:in `join'
| /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/concurrency.rb:15:in `each'
| /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/concurrency.rb:15:in `split_across_threads'
| /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/resource_watcher.rb:56:in `sync_resources'
| /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/statsd.rb:55:in `block in measure_method'
| /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/resource_watcher.rb:34:in `run'
| /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/deploy_task.rb:446:in `deploy_resources'
| /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/deploy_task.rb:451:in `deploy_all_resources'
| /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/statsd.rb:55:in `block in measure_method'
| /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/deploy_task.rb:187:in `run!'
| /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/exe/kubernetes-deploy:84:in `block in <top (required)>'
| /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/options_helper.rb:28:in `with_processed_template_paths'
| /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/exe/kubernetes-deploy:71:in `<top (required)>'
| /usr/local/bin/kubernetes-deploy:23:in `load'
| /usr/local/bin/kubernetes-deploy:23:in `<main>'
| * #<Thread:0x000055b01ac99f08@/var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/concurrency.rb:13 sleep_forever>
| rb_thread_t:0x000055b01a322530 native:0x00007faaed6ce700 int:0
| depended by: tb_thread_id:0x000055b018dc3570
| /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/resource_cache.rb:46:in `synchronize'
| /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/resource_cache.rb:46:in `use_or_populate_cache'
| /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/resource_cache.rb:28:in `get_all'
| /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/kubernetes_resource/deployment.rb:168:in `find_latest_rs'
| /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/kubernetes_resource/deployment.rb:15:in `sync'
| /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/resource_watcher.rb:56:in `block in sync_resources'
| /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/concurrency.rb:13:in `each'
| /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/concurrency.rb:13:in `block (2 levels) in split_across_threads'
Expected behavior:
It should deploy.
Actual behavior:
The above stacktrace. A second deploy fixes the issue.
Version(s) affected:
0.30.0
Steps to Reproduce
This happens frequently on a specific internal deployment and cluster, I can provide more details outside of the public issue tracker.
Before we spend a ton of time investigating this I'd like to see if https://github.com/Shopify/kubernetes-deploy/pull/603 fixes it.
I am also able to get a failure to occur pretty deterministically:
Traceback (most recent call last):
--
| 14: from /usr/local/bin/kubernetes-deploy:23:in `<main>'
| 13: from /usr/local/bin/kubernetes-deploy:23:in `load'
| 12: from /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/exe/kubernetes-deploy:71:in `<top (required)>'
| 11: from /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/options_helper.rb:28:in `with_processed_template_paths'
| 10: from /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/exe/kubernetes-deploy:84:in `block in <top (required)>'
| 9: from /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/deploy_task.rb:187:in `run!'
| 8: from /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/statsd.rb:55:in `block in measure_method'
| 7: from /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/deploy_task.rb:451:in `deploy_all_resources'
| 6: from /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/deploy_task.rb:446:in `deploy_resources'
| 5: from /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/resource_watcher.rb:34:in `run'
| 4: from /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/statsd.rb:55:in `block in measure_method'
| 3: from /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/resource_watcher.rb:56:in `sync_resources'
| 2: from /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/concurrency.rb:15:in `split_across_threads'
| 1: from /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/concurrency.rb:15:in `each'
| /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/concurrency.rb:15:in `join': No live threads left. Deadlock? (fatal)
| 2 threads, 2 sleeps current:0x00005635f66c70e0 main thread:0x00005635f4ce2570
| * #<Thread:0x00005635f4d14b80 sleep_forever>
| rb_thread_t:0x00005635f4ce2570 native:0x00007f3b31615740 int:1
| /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/concurrency.rb:15:in `join'
| /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/concurrency.rb:15:in `each'
| /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/concurrency.rb:15:in `split_across_threads'
| /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/resource_watcher.rb:56:in `sync_resources'
| /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/statsd.rb:55:in `block in measure_method'
| /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/resource_watcher.rb:34:in `run'
| /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/deploy_task.rb:446:in `deploy_resources'
| /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/deploy_task.rb:451:in `deploy_all_resources'
| /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/statsd.rb:55:in `block in measure_method'
| /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/deploy_task.rb:187:in `run!'
| /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/exe/kubernetes-deploy:84:in `block in <top (required)>'
| /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/options_helper.rb:28:in `with_processed_template_paths'
| /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/exe/kubernetes-deploy:71:in `<top (required)>'
| /usr/local/bin/kubernetes-deploy:23:in `load'
| /usr/local/bin/kubernetes-deploy:23:in `<main>'
| * #<Thread:0x00005635f70bc740@/var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/concurrency.rb:13 sleep_forever>
| rb_thread_t:0x00005635f66c70e0 native:0x00007f3b1eeee700 int:0
| depended by: tb_thread_id:0x00005635f4ce2570
| /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/resource_cache.rb:46:in `synchronize'
| /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/resource_cache.rb:46:in `use_or_populate_cache'
| /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/resource_cache.rb:28:in `get_all'
| /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/kubernetes_resource/pod_set_base.rb:54:in `find_pods'
| /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/kubernetes_resource/replica_set.rb:73:in `fetch_pods_if_needed'
| /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/kubernetes_resource/replica_set.rb:20:in `sync'
| /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/kubernetes_resource/deployment.rb:186:in `find_latest_rs'
| /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/kubernetes_resource/deployment.rb:15:in `sync'
| /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/resource_watcher.rb:56:in `block in sync_resources'
| /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/concurrency.rb:13:in `each'
| /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/concurrency.rb:13:in `block (2 levels) in split_across_threads'
Also on 0.30.0
@actgardner if you re-deploy do you still get the same failure, or does it sometimes pass?
A quick read of https://bugs.ruby-lang.org/issues/14725 indicates this might be a bug specific to ruby 2.5
bumping our deploy agent to ruby 2.6.5 I am able to get my previously failing deploy to succeed. @actgardner can you try again?
This obviously isn't conclusive, but i'd say this adds a point to the column of "this is a ruby 2.5" issue, though we'd need to verify the actual patch versions to add confidence.