puppetlabs-havana
puppetlabs-havana copied to clipboard
ceilometer-dbsync fails on first run of controller role
ceilometer-dbsync exits with a failure when applying the controller role for the first time on a clean system.
Notice: /Stage[main]/Ceilometer::Db/Exec[ceilometer-dbsync]/returns: 2014-01-30 12:12:31.182 26917 TRACE ceilometer ConnectionFailure: could not connect to 172.16.33.4:27017: [Errno 111] ECONNREFUSED
Notice: /Stage[main]/Ceilometer::Db/Exec[ceilometer-dbsync]/returns: 2014-01-30 12:12:31.182 26917 TRACE ceilometer
Error: /Stage[main]/Ceilometer::Db/Exec[ceilometer-dbsync]: Failed to call refresh: ceilometer-dbsync --config-file=/etc/ceilometer/ceilometer.conf returned 1 instead of one of [0]
Error: /Stage[main]/Ceilometer::Db/Exec[ceilometer-dbsync]: ceilometer-dbsync --config-file=/etc/ceilometer/ceilometer.conf returned 1 instead of one of [0]
The problem seems to be that puppet executes ceilometer-dbsync immediately after starting the mongod service, which does not always work out well because mongod takes some time to allocate a journal before it will accept incoming connections on 27017. On my VM this process takes about 15 seconds.
Yes. While I would consider this to be a bug in the MongoDB startup scripts (my opinion is that they should not return until the database is initialized, precisely because of problems like this), it's something that needs to be reliably addressed. I'm thinking a script that tries n times with m seconds between each try.
Mine's actually running the ceilometer-dbsync before it installs mongo. So there is some dependency ordering issue here.
(which is odd considering the explicit arrows in the controller role, mongo is before ceilometer-api)
Oh, the role ordering will do almost nothing to ensure the dependency ordering. Contained classes will float away and become unordered. There are workarounds that I find offensive. I'll take a look at making stronger dependency ordering within the profile. It should be possible. Sorry for taking so long to close this.
(I'm actually pulling out that ordering in future versions since "the goggles do nothing").
Ordering is the main reason I can't use the stackforge modules for anything other than demo envs :-(
I'm hoping 'contains' will make the situation better in the future
I'm not sure this should especially be fixed this way, but I submitted a patch at https://review.openstack.org/#/c/81950/ to cause ceilometer-dbsync to retry on a failed connection.
It's really mongodb's fault, but we can't really help that.
A better way to solve this would be to make the mongodb::server::service
class block on the service using a "validate connection" resource similar to the one in the puppetdb module.
As a note I'm currently using this solution https://github.com/Katello/puppet-service_wait
Workaround: http://openstack.redhat.com/Workarounds_2014_01#Failed_to_start_mongodb