runit
runit copied to clipboard
Entering infinite loop during run
Cookbook version
1.7.8 - and 3.0.0
Chef-client version
12.5.1
Platform Details
Ubuntu 14.04, running on Softlayer VPS
Scenario:
During the chef run, the node will just start into an infinite loop when it hits one of my runit_service
configurations. I had run into this problem a bit earlier and seemed to get around it with a different version of runit, but it has re-emerged.
Steps to Reproduce:
This is the entire contents of the recipe in question:
health_check_port = 2702
package 'ruby2.3'
package 'obsidian-account-service' do
action :upgrade
notifies :restart, 'runit_service[account-service]'
end
package 'libpq-dev'
file "/opt/obsidian/account-service/settings/account.json" do
content JSON.pretty_generate(
'externalMessaging' => {
'eventStore' => {
'host' => node[:event_store_message_bus][:host],
'port' => node[:event_store_message_bus][:http_port]
}
},
'eventPublishing' => {
'eventStore' => {
'host' => node[:event_store_message_bus][:host],
'port' => node[:event_store_message_bus][:http_port]
}
}
)
end
file "/opt/obsidian/account-service/settings/account_client.json" do
content JSON.pretty_generate(
'eventStore' => {
'host' => node[:event_store_message_bus][:host],
'port' => node[:event_store_message_bus][:http_port]
}
)
end
file "/opt/obsidian/account-service/settings/error_telemetry_component_client.json" do
content JSON.pretty_generate(
'eventStore' => {
'host' => node[:event_store_message_bus][:host],
'port' => node[:event_store_message_bus][:http_port]
}
)
end
file "/opt/obsidian/account-service/settings/event_store_client_http.json" do
content JSON.pretty_generate(
'host' => node[:event_store][:host],
'port' => node[:event_store][:http_port]
)
end
file "/opt/obsidian/account-service/settings/read_model.json" do
content JSON.pretty_generate(
'postgresConnection' => {
'database' => node[:read_model][:database],
'host' => node[:read_model][:host],
'password' => node[:read_model][:password],
'username' => node[:read_model][:username]
}
)
end
file "/opt/obsidian/account-service/settings/health.json" do
content JSON.pretty_generate(
'port' => health_check_port
)
end
runit_service "account-service" do
default_logger true
end
obsidian_component_health_check "account-service" do
port health_check_port
end
Expected Result:
Historically this has worked fine - we have used this strategy to deploy our services for over a year, but as the number of services has grown, we're starting to see this. This recipe should have just configured a runit_service to run.
Actual Result:
Logs are here: https://gist.github.com/litch/a4b1c7a7d4cbc57c58c0ee503811fa45
The resource just keeps invoking itself recursively it seems. =/
Oh, also I should note that this only happens on the first time each recipe is run on the machine. So in theory, to get this machine that is hosting 9 of the services to work, I could start the chef run and abort once it starts to recurse and then the machine would converge as expected on future runs. But that is clearly probematic.
It looks like reverting to runit 1.6.0 fixed this problem.
news about this bug ?
Anyone? We are having this issue also.
@tas50 - ???
@jtimberman - ??? Any ideas here?
The only reason I need this if for Chef push-jobs, but I do need it. Having to manually run chef-client, manually interrupt it and run it again is unacceptable.
Sorry, I'm not actively involved in maintaining this cookbook and don't have cycles to dig into this.
@tas50 @iennae @cheeseplus halp?
@jtimberman thanks for the resp. I set my push-jobs wrapper cookbook dependencies to runit = 1.6.0 based on an earlier comment. Have not yet verified if it works. Not sure what I'm losing by doing this.
I'm currently working on the push jobs cookbook to clean up some of the old recipes and create new resources for managing things. I'll carve out some time to make sure the runit logic works. I would highly recommend on Ubuntu 14.04 that you use Upstart instead. It's far more reliable and simpler to setup.
I'm running into this same problem and am having a lot of trouble building an isolated test case. Even directly copy/pasting my code from the place where I'm having the problem to a separate chef environment and running them under test kitchen it doesn't seem to be reproducable :(
I will say that at least in my problem code, commenting out all of the notifies
that notify restart_service and restart_log_service stops the loop, but I'm not sure why not commenting them causes the loop. Looking at the output it doesn't show any changes being made in the subsequent runs through the restart_service, create, and enable bits, yet it just keeps restarting over and over again.
Frustrating.
I hit a similar situation with v4.3.0, with creating config
file related to logs. In my case, the file gets created in a directory on an NFS mount with root squash enabled. So, Chef is unable to change the owner/group of the file. From my understanding, this happens
- Creation of
config
happens increate
action of the resource - During this, it notifies a restart of the service using a ruby block.
- This ruby block calls the
enable
action of runit_service. - The first thing the
enable
action does is callingcreate
action of the custom resource, thus forming a cycle. - In normal scenarios, this
create
action will not callenable
again, because the config file resource is already up to date, and no restart of the service happens, thus breaking the loop. - But in this scenario, the config file resource is never up to date, because it's ownership is never the intended one, thus the two actions keeps on calling each other, thus forming an infinite loop.
Any specific reason the functionality is split between create
and enable
, and why can't all of it be just under enable
? At least that will prevent enable
and create
calling each other.