chef-gluster icon indicating copy to clipboard operation
chef-gluster copied to clipboard

After successful initial run, subsequent runs blow up at server_extend

Open donovanmuller opened this issue 8 years ago • 16 comments

Initial run of gluster::server is successful. Volume created and started. When gluster::server runs again, the following gets vomited out:

NoMethodError
-------------
private method `select' called for nil:NilClass

...

Relevant File Content:
----------------------
/var/chef/cache/cookbooks/gluster/recipes/server_extend.rb:

   17:        next
   18:      end
   19:
   20:      unless node.default['gluster']['server']['volumes'][volume_name].attribute?('bricks_waiting_to_join')
   21:        node.default['gluster']['server']['volumes'][volume_name]['bricks_waiting_to_join'] = ''
   22:      end
   23:
   24>>     peer_bricks = chef_node['gluster']['server']['volumes'][volume_name]['bricks'].select { |brick| brick.include? volume_name }
   25:      brick_count += (peer_bricks.count || 0)
   26:      peer_bricks.each do |brick|
   27:        Chef::Log.info("Checking #{peer}:#{brick}")
   28:        unless brick_in_volume?(peer, brick, volume_name)
   29:          node.default['gluster']['server']['volumes'][volume_name]['bricks_waiting_to_join'] << " #{peer}:#{brick}"
   30:        end
   31:      end
   32:    end
   33:

donovanmuller avatar Feb 09 '16 12:02 donovanmuller

It is set here: https://github.com/shortdudey123/chef-gluster/blob/master/recipes/server_setup.rb#L52

Can you verify node['gluster']['server']['volumes']['ose3-vol']['peers'] contains the FQDN or hostname of the node?

shortdudey123 avatar Feb 09 '16 15:02 shortdudey123

It does, I left it unexpanded for the screenshot but it was definitely populated.

donovanmuller avatar Feb 09 '16 15:02 donovanmuller

Can you post the context of the failed run? (not just the exception)

shortdudey123 avatar Feb 09 '16 20:02 shortdudey123

Hi @donovanmuller!

Sorry to hear you are having issues. This appears that your node is trying to load another chef-client that doesn't have that attribute set. Could you please confirm that the same cookbook was run on all nodes that are in your peer list, that the chef node name is the same as the peer name that gluster is using (sometimes when the chef node name is an FQDN and not a hostname or vice versa this can cause a problem like this).

What would really help is the output of your node['gluster']['server']['volumes'] entry in your cookbook attributes file, and the attribute node['gluster']['server']['volumes']['ose3-vol'] from each of your peers.

Thanks in advance!

Andy

andyrepton avatar Feb 11 '16 14:02 andyrepton

@Seth-Karlo Below is my complete gluster attributes:

default['gluster']['version'] = '3.7'
default['gluster']['server']['brick_mount_path'] = '/data'
default['gluster']['server']['disks'] = []
default['gluster']['server']['volumes'] = {
  'ose3' => {
    'peers' => ['master01.bison.pi.b','node01.bison.pi.b'],
    'replica_count' => 2,
    'volume_type' => 'replicated',
    'disks' => ['/dev/sda4'],
    'size' => '10G'
  }
}

master01-attr

node02-attr

Is there anything else you need?

donovanmuller avatar Feb 12 '16 05:02 donovanmuller

Thank you for your report, I apologise for taking so long to respond. I'll see if I can reproduce at this end and get back to you.

andyrepton avatar Feb 22 '16 12:02 andyrepton

Any news about this ? I'm experiencing the same problem on opsworks

alez007 avatar Apr 12 '16 13:04 alez007

@alez007 can you verify the cookbook version you are using so that we make sure we are looking at the same thing?

shortdudey123 avatar Apr 12 '16 18:04 shortdudey123

I'm pretty confident this is caused by chef_node not being set. I've been a bit distracted lately, but I'll try and look into this.

andyrepton avatar May 07 '16 19:05 andyrepton

I am using OpsWorks and experiencing this problem, I am wondering if it could be OpsWorks' fault and the way it updates the cookbooks on each node such that every time the "custom cookbooks" are updated, it wipes the node's attributes?

laurencepettitt avatar Jun 23 '16 13:06 laurencepettitt

@LorenzoPetite possibly? i don't use OpsWorks and am not too familiar with it @Seth-Karlo you use OpsWorks at all and might be able to shed light here?

shortdudey123 avatar Jun 23 '16 17:06 shortdudey123

@shortdudey123 @LorenzoPetite Sorry no, I've never used Opsworks before. We could possibly test this by adding some echo statements into the cookbook in print out those attributes during compile time. If they report as empty we can then start looking into whether or not they are set properly.

andyrepton avatar Jun 23 '16 18:06 andyrepton

Following @Seth-Karlo's suggestion, I tested with some echo statements. In the server_setup recipe, i found that: node['gluster']['server']['volumes'][volume_name]['bricks'] produces: ["/gluster/servu/brick"]

However in the server_extend recipe, the reason chef_node['gluster']['server']['volumes'][volume_name]['bricks'] causes an error undefined method '[]' for nil:NilClass is because chef_node['gluster'] is somehow nil. Strangely, echoing chef_node produces node[gluster1]

I realise now this is actually a slightly different error than @donovanmuller's, but in both cases there seems to be a problem with attribute persistence.

How could this be possible?

laurencepettitt avatar Jul 18 '16 14:07 laurencepettitt

chef_node - iterates over all nodes in cluster. So - it won't iterate when on any node bricks are empty. I have the same problem on one of my test environment. I'm not sure but it can be connected with any chef error during setup cluster, when bricks aren't propagated to chef server, methinks.

theundefined avatar Dec 10 '16 09:12 theundefined

I stumbled upon this today too. On the initial run of the chef-client, the cookbook failed due to an error in the configuration on my side. The chef-client was able to create the volume on the first run though. Executing knife node show <NODE NAME> -a gluster confirmed that ['gluster]['server']['volumes']['myvolume']['bricks'] was empty. Subsequent runs of chef-client failed with the error stated in the first comment of this issue. As far as I know a chef-client persists its attributes on the Chef server only after a successful run. No run of the chef-client completed successfully so the bricks attribute can never be saved.

My workaround was to set ['gluster']['server']['server_extend_enabled'] to false, trigger a run of the chef-client (which succeeded) and set ['gluster']['server']['server_extend_enabled'] back to true.

wndhydrnt avatar Oct 13 '17 11:10 wndhydrnt