krane icon indicating copy to clipboard operation
krane copied to clipboard

Krane fails with stack trace on some deployments

Open actgardner opened this issue 5 years ago • 5 comments

Bug report

krane fails with a deadlock during some deploys:

Traceback (most recent call last):
--
  | 14: from /usr/local/bin/kubernetes-deploy:23:in `<main>'
  | 13: from /usr/local/bin/kubernetes-deploy:23:in `load'
  | 12: from /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/exe/kubernetes-deploy:71:in `<top (required)>'
  | 11: from /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/options_helper.rb:28:in `with_processed_template_paths'
  | 10: from /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/exe/kubernetes-deploy:84:in `block in <top (required)>'
  | 9: from /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/deploy_task.rb:187:in `run!'
  | 8: from /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/statsd.rb:55:in `block in measure_method'
  | 7: from /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/deploy_task.rb:451:in `deploy_all_resources'
  | 6: from /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/deploy_task.rb:446:in `deploy_resources'
  | 5: from /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/resource_watcher.rb:34:in `run'
  | 4: from /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/statsd.rb:55:in `block in measure_method'
  | 3: from /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/resource_watcher.rb:56:in `sync_resources'
  | 2: from /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/concurrency.rb:15:in `split_across_threads'
  | 1: from /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/concurrency.rb:15:in `each'
  | /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/concurrency.rb:15:in `join': No live threads left. Deadlock? (fatal)
  | 2 threads, 2 sleeps current:0x000055b01a322530 main thread:0x000055b018dc3570
  | * #<Thread:0x000055b018e24b68 sleep_forever>
  | rb_thread_t:0x000055b018dc3570 native:0x00007faaf316a740 int:1
  | /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/concurrency.rb:15:in `join'
  | /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/concurrency.rb:15:in `each'
  | /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/concurrency.rb:15:in `split_across_threads'
  | /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/resource_watcher.rb:56:in `sync_resources'
  | /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/statsd.rb:55:in `block in measure_method'
  | /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/resource_watcher.rb:34:in `run'
  | /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/deploy_task.rb:446:in `deploy_resources'
  | /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/deploy_task.rb:451:in `deploy_all_resources'
  | /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/statsd.rb:55:in `block in measure_method'
  | /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/deploy_task.rb:187:in `run!'
  | /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/exe/kubernetes-deploy:84:in `block in <top (required)>'
  | /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/options_helper.rb:28:in `with_processed_template_paths'
  | /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/exe/kubernetes-deploy:71:in `<top (required)>'
  | /usr/local/bin/kubernetes-deploy:23:in `load'
  | /usr/local/bin/kubernetes-deploy:23:in `<main>'
  | * #<Thread:0x000055b01ac99f08@/var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/concurrency.rb:13 sleep_forever>
  | rb_thread_t:0x000055b01a322530 native:0x00007faaed6ce700 int:0
  | depended by: tb_thread_id:0x000055b018dc3570
  | /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/resource_cache.rb:46:in `synchronize'
  | /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/resource_cache.rb:46:in `use_or_populate_cache'
  | /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/resource_cache.rb:28:in `get_all'
  | /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/kubernetes_resource/deployment.rb:168:in `find_latest_rs'
  | /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/kubernetes_resource/deployment.rb:15:in `sync'
  | /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/resource_watcher.rb:56:in `block in sync_resources'
  | /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/concurrency.rb:13:in `each'
  | /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/concurrency.rb:13:in `block (2 levels) in split_across_threads'

Expected behavior:

It should deploy.

Actual behavior:

The above stacktrace. A second deploy fixes the issue.

Version(s) affected:

0.30.0

Steps to Reproduce

This happens frequently on a specific internal deployment and cluster, I can provide more details outside of the public issue tracker.

actgardner avatar Oct 25 '19 13:10 actgardner

Before we spend a ton of time investigating this I'd like to see if https://github.com/Shopify/kubernetes-deploy/pull/603 fixes it.

dturn avatar Oct 25 '19 15:10 dturn

I am also able to get a failure to occur pretty deterministically:

Traceback (most recent call last):
--
  | 14: from /usr/local/bin/kubernetes-deploy:23:in `<main>'
  | 13: from /usr/local/bin/kubernetes-deploy:23:in `load'
  | 12: from /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/exe/kubernetes-deploy:71:in `<top (required)>'
  | 11: from /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/options_helper.rb:28:in `with_processed_template_paths'
  | 10: from /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/exe/kubernetes-deploy:84:in `block in <top (required)>'
  | 9: from /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/deploy_task.rb:187:in `run!'
  | 8: from /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/statsd.rb:55:in `block in measure_method'
  | 7: from /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/deploy_task.rb:451:in `deploy_all_resources'
  | 6: from /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/deploy_task.rb:446:in `deploy_resources'
  | 5: from /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/resource_watcher.rb:34:in `run'
  | 4: from /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/statsd.rb:55:in `block in measure_method'
  | 3: from /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/resource_watcher.rb:56:in `sync_resources'
  | 2: from /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/concurrency.rb:15:in `split_across_threads'
  | 1: from /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/concurrency.rb:15:in `each'
  | /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/concurrency.rb:15:in `join': No live threads left. Deadlock? (fatal)
  | 2 threads, 2 sleeps current:0x00005635f66c70e0 main thread:0x00005635f4ce2570
  | * #<Thread:0x00005635f4d14b80 sleep_forever>
  | rb_thread_t:0x00005635f4ce2570 native:0x00007f3b31615740 int:1
  | /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/concurrency.rb:15:in `join'
  | /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/concurrency.rb:15:in `each'
  | /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/concurrency.rb:15:in `split_across_threads'
  | /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/resource_watcher.rb:56:in `sync_resources'
  | /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/statsd.rb:55:in `block in measure_method'
  | /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/resource_watcher.rb:34:in `run'
  | /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/deploy_task.rb:446:in `deploy_resources'
  | /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/deploy_task.rb:451:in `deploy_all_resources'
  | /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/statsd.rb:55:in `block in measure_method'
  | /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/deploy_task.rb:187:in `run!'
  | /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/exe/kubernetes-deploy:84:in `block in <top (required)>'
  | /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/options_helper.rb:28:in `with_processed_template_paths'
  | /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/exe/kubernetes-deploy:71:in `<top (required)>'
  | /usr/local/bin/kubernetes-deploy:23:in `load'
  | /usr/local/bin/kubernetes-deploy:23:in `<main>'
  | * #<Thread:0x00005635f70bc740@/var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/concurrency.rb:13 sleep_forever>
  | rb_thread_t:0x00005635f66c70e0 native:0x00007f3b1eeee700 int:0
  | depended by: tb_thread_id:0x00005635f4ce2570
  | /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/resource_cache.rb:46:in `synchronize'
  | /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/resource_cache.rb:46:in `use_or_populate_cache'
  | /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/resource_cache.rb:28:in `get_all'
  | /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/kubernetes_resource/pod_set_base.rb:54:in `find_pods'
  | /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/kubernetes_resource/replica_set.rb:73:in `fetch_pods_if_needed'
  | /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/kubernetes_resource/replica_set.rb:20:in `sync'
  | /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/kubernetes_resource/deployment.rb:186:in `find_latest_rs'
  | /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/kubernetes_resource/deployment.rb:15:in `sync'
  | /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/resource_watcher.rb:56:in `block in sync_resources'
  | /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/concurrency.rb:13:in `each'
  | /var/lib/gems/2.5.0/gems/kubernetes-deploy-0.30.0/lib/kubernetes-deploy/concurrency.rb:13:in `block (2 levels) in split_across_threads'

Also on 0.30.0

dalehamel avatar Oct 31 '19 14:10 dalehamel

@actgardner if you re-deploy do you still get the same failure, or does it sometimes pass?

dalehamel avatar Oct 31 '19 15:10 dalehamel

A quick read of https://bugs.ruby-lang.org/issues/14725 indicates this might be a bug specific to ruby 2.5

dalehamel avatar Oct 31 '19 15:10 dalehamel

bumping our deploy agent to ruby 2.6.5 I am able to get my previously failing deploy to succeed. @actgardner can you try again?

This obviously isn't conclusive, but i'd say this adds a point to the column of "this is a ruby 2.5" issue, though we'd need to verify the actual patch versions to add confidence.

dalehamel avatar Nov 01 '19 16:11 dalehamel