runit icon indicating copy to clipboard operation
runit copied to clipboard

Entering infinite loop during run

Open litch opened this issue 7 years ago • 12 comments

Cookbook version

1.7.8 - and 3.0.0

Chef-client version

12.5.1

Platform Details

Ubuntu 14.04, running on Softlayer VPS

Scenario:

During the chef run, the node will just start into an infinite loop when it hits one of my runit_service configurations. I had run into this problem a bit earlier and seemed to get around it with a different version of runit, but it has re-emerged.

Steps to Reproduce:

This is the entire contents of the recipe in question:

health_check_port = 2702

package 'ruby2.3'

package 'obsidian-account-service' do
  action :upgrade
  notifies :restart, 'runit_service[account-service]'
end

package 'libpq-dev'

file "/opt/obsidian/account-service/settings/account.json" do
  content JSON.pretty_generate(
    'externalMessaging' => {
      'eventStore' => {
        'host' => node[:event_store_message_bus][:host],
        'port' => node[:event_store_message_bus][:http_port]
      }
    },

    'eventPublishing' => {
      'eventStore' => {
        'host' => node[:event_store_message_bus][:host],
        'port' => node[:event_store_message_bus][:http_port]
      }
    }
  )
end

file "/opt/obsidian/account-service/settings/account_client.json" do
  content JSON.pretty_generate(
    'eventStore' => {
      'host' => node[:event_store_message_bus][:host],
      'port' => node[:event_store_message_bus][:http_port]
    }
  )
end

file "/opt/obsidian/account-service/settings/error_telemetry_component_client.json" do
  content JSON.pretty_generate(
    'eventStore' => {
      'host' => node[:event_store_message_bus][:host],
      'port' => node[:event_store_message_bus][:http_port]
    }
  )
end

file "/opt/obsidian/account-service/settings/event_store_client_http.json" do
  content JSON.pretty_generate(
    'host' => node[:event_store][:host],
    'port' => node[:event_store][:http_port]
  )
end

file "/opt/obsidian/account-service/settings/read_model.json" do
  content JSON.pretty_generate(
    'postgresConnection' => {
      'database' => node[:read_model][:database],
      'host' => node[:read_model][:host],
      'password' => node[:read_model][:password],
      'username' => node[:read_model][:username]
    }
  )
end

file "/opt/obsidian/account-service/settings/health.json" do
  content JSON.pretty_generate(
    'port' => health_check_port
  )
end

runit_service "account-service" do
  default_logger true
end

obsidian_component_health_check "account-service" do
  port health_check_port
end

Expected Result:

Historically this has worked fine - we have used this strategy to deploy our services for over a year, but as the number of services has grown, we're starting to see this. This recipe should have just configured a runit_service to run.

Actual Result:

Logs are here: https://gist.github.com/litch/a4b1c7a7d4cbc57c58c0ee503811fa45

The resource just keeps invoking itself recursively it seems. =/

litch avatar Nov 22 '16 14:11 litch

Oh, also I should note that this only happens on the first time each recipe is run on the machine. So in theory, to get this machine that is hosting 9 of the services to work, I could start the chef run and abort once it starts to recurse and then the machine would converge as expected on future runs. But that is clearly probematic.

litch avatar Nov 22 '16 15:11 litch

It looks like reverting to runit 1.6.0 fixed this problem.

litch avatar Nov 22 '16 19:11 litch

news about this bug ?

robsonpeixoto avatar Feb 17 '17 13:02 robsonpeixoto

Anyone? We are having this issue also.

daxgames avatar Aug 16 '17 18:08 daxgames

@tas50 - ???

daxgames avatar Aug 16 '17 18:08 daxgames

@jtimberman - ??? Any ideas here?

daxgames avatar Aug 16 '17 18:08 daxgames

The only reason I need this if for Chef push-jobs, but I do need it. Having to manually run chef-client, manually interrupt it and run it again is unacceptable.

daxgames avatar Aug 16 '17 18:08 daxgames

Sorry, I'm not actively involved in maintaining this cookbook and don't have cycles to dig into this.

@tas50 @iennae @cheeseplus halp?

jtimberman avatar Aug 16 '17 22:08 jtimberman

@jtimberman thanks for the resp. I set my push-jobs wrapper cookbook dependencies to runit = 1.6.0 based on an earlier comment. Have not yet verified if it works. Not sure what I'm losing by doing this.

daxgames avatar Aug 17 '17 17:08 daxgames

I'm currently working on the push jobs cookbook to clean up some of the old recipes and create new resources for managing things. I'll carve out some time to make sure the runit logic works. I would highly recommend on Ubuntu 14.04 that you use Upstart instead. It's far more reliable and simpler to setup.

tas50 avatar Dec 03 '17 03:12 tas50

I'm running into this same problem and am having a lot of trouble building an isolated test case. Even directly copy/pasting my code from the place where I'm having the problem to a separate chef environment and running them under test kitchen it doesn't seem to be reproducable :(

I will say that at least in my problem code, commenting out all of the notifies that notify restart_service and restart_log_service stops the loop, but I'm not sure why not commenting them causes the loop. Looking at the output it doesn't show any changes being made in the subsequent runs through the restart_service, create, and enable bits, yet it just keeps restarting over and over again.

Frustrating.

kitchen avatar Aug 06 '18 23:08 kitchen

I hit a similar situation with v4.3.0, with creating config file related to logs. In my case, the file gets created in a directory on an NFS mount with root squash enabled. So, Chef is unable to change the owner/group of the file. From my understanding, this happens

  1. Creation of config happens in create action of the resource
  2. During this, it notifies a restart of the service using a ruby block.
  3. This ruby block calls the enable action of runit_service.
  4. The first thing the enable action does is calling create action of the custom resource, thus forming a cycle.
  5. In normal scenarios, this create action will not call enable again, because the config file resource is already up to date, and no restart of the service happens, thus breaking the loop.
  6. But in this scenario, the config file resource is never up to date, because it's ownership is never the intended one, thus the two actions keeps on calling each other, thus forming an infinite loop.

Any specific reason the functionality is split between create and enable, and why can't all of it be just under enable ? At least that will prevent enable and create calling each other.

balasankarc avatar Mar 11 '19 06:03 balasankarc