calamari icon indicating copy to clipboard operation
calamari copied to clipboard

Calamari see hosts but says no cluster

Open ChevallierP opened this issue 7 years ago • 8 comments

Hi there,

I'm having huge trouble setting up calamari over a two nodes ceph cluster (this is for lab purpose). I have two physical servers with Centos 7 and a CEPH cluster with them (node 1 is admin, mon + OSD, and node 2 is just OSD).

On top of this i'm trying to add Calamari, so i created a CentOS 7 virtual machine on my labtop. After installing calamari, i'm connecting the hosts manually, the key are accepted by the master, but the graphic interface is showing me this time and times :

image

Can anyone help me around these ? Already trying to initialize, restart cthulu, restart server, etc, and i'm now struggling to go further in debugging.

Thanks for any help, P.Chevallier

ChevallierP avatar Jul 05 '17 15:07 ChevallierP

On your calamari server execute this command. check if there are any errors: salt '*' ceph.get_heartbeats

Osa1989 avatar Aug 18 '17 16:08 Osa1989

I have the same problem. I executed the command and got this an_dev-cp1.aeronet.dev: The minion function caused an exception: Traceback (most recent call last): File "/usr/lib/python2.7/dist-packages/salt/minion.py", line 1020, in _thread_return return_data = func(*args, **kwargs) File "/var/cache/salt/minion/extmods/modules/ceph.py", line 467, in get_heartbeats service_data = service_status(filename) File "/var/cache/salt/minion/extmods/modules/ceph.py", line 526, in service_status fsid = json.loads(admin_socket(socket_path, ['status'], 'json'))['cluster_fsid'] KeyError: 'cluster_fsid' I installed Salt2014.7, Calamari1.3.1, Ubuntu16.04

Ken4scholars avatar Nov 15 '17 14:11 Ken4scholars

hello,

Any updates ?

Thanks, -Ali

davkar3n avatar Nov 16 '17 07:11 davkar3n

first ,execute this command on your calamari server,and then paste the result here

xtbing avatar Jan 03 '18 01:01 xtbing

I have the same problem. I executed the command and got this an_dev-cp1.aeronet.dev: The minion function caused an exception: Traceback (most recent call last): File "/usr/lib/python2.7/dist-packages/salt/minion.py", line 1020, in _thread_return return_data = func(*args, **kwargs) File "/var/cache/salt/minion/extmods/modules/ceph.py", line 467, in get_heartbeats service_data = service_status(filename) File "/var/cache/salt/minion/extmods/modules/ceph.py", line 526, in service_status fsid = json.loads(admin_socket(socket_path, ['status'], 'json'))['cluster_fsid'] KeyError: 'cluster_fsid' I installed Salt2014.7, Calamari1.3.1, Ubuntu16.04

I have the same question on CentOS7,how did you solve the Error? I only find a simple reply from those links:

update:

imoyao avatar Nov 04 '19 08:11 imoyao

Please try the follow change: https://github.com/ceph/calamari/blob/1.3/salt/srv/salt/_modules/ceph.py: L594 Change AdminSocketError to Exception. your local file positon maybe: /opt/calamari/salt/salt/_modules/ceph.py : L594

then run command: salt "*" saltutil.sync_all. or restart salt-minion services of every ceph nodes.

syf-zsxm avatar Nov 05 '19 02:11 syf-zsxm

Please try the follow change: https://github.com/ceph/calamari/blob/1.3/salt/srv/salt/_modules/ceph.py: L594 Change AdminSocketError to Exception. your local file positon maybe: /opt/calamari/salt/salt/_modules/ceph.py : L594

then run command: salt "*" saltutil.sync_all. or restart salt-minion services of every ceph nodes.

I'm very grateful for your reply, and by modifying the code, my work has taken me one step further. I will record my own process below to help those who might encounter this problem:

  • I excute salt '*' ceph.get_heartbeats,i got this:
node2:
    The minion function caused an exception: Traceback (most recent call last):
      File "/usr/lib/python2.7/site-packages/salt/minion.py", line 1200, in _thread_return
        return_data = func(*args, **kwargs)
      File "/var/cache/salt/minion/extmods/modules/ceph.py", line 534, in get_heartbeats
        service_data = service_status(filename)
      File "/var/cache/salt/minion/extmods/modules/ceph.py", line 593, in service_status
        fsid = json.loads(admin_socket(socket_path, ['status'], 'json'))['cluster_fsid']
    KeyError: 'cluster_fsid'
…… # It is same.
    The minion function caused an exception: Traceback (most recent call last):
      File "/usr/lib/python2.7/site-packages/salt/minion.py", line 1200, in _thread_return
        return_data = func(*args, **kwargs)
      File "/var/cache/salt/minion/extmods/modules/ceph.py", line 534, in get_heartbeats
        service_data = service_status(filename)
      File "/var/cache/salt/minion/extmods/modules/ceph.py", line 593, in service_status
        fsid = json.loads(admin_socket(socket_path, ['status'], 'json'))['cluster_fsid']
    KeyError: 'cluster_fsid'

  • change AdminSocketError to (AdminSocketError,KeyError)
try:
    fsid = json.loads(admin_socket(socket_path, ['status'], 'json'))['cluster_fsid']
except (AdminSocketError,KeyError): # also,you can use Exception
    # older osd/mds daemons don't support 'status'; try our best
   pass             #(code here don't change)

hit: the code may be /opt/calamari/salt/salt/_modules/ceph.py in admin node and /var/cache/salt/minion/extmods/modules/ceph.py in other node .

  • excutesalt "*" saltutil.sync_all in admin node,i got this:
node3:
    ----------
    beacons:
    grains:
    modules:
    output:
    renderers:
    returners:
    sdb:
    states:
    utils:
……
node2:
    ----------
    beacons:
    grains:
    modules:
    output:
    renderers:
    returners:
    sdb:
    states:
    utils:
  • then,excute salt '*' ceph.get_heartbeats, i got this:
node2:
    |_
      ----------
      boot_time:
          1573005001
      ceph_version:
          2:13.2.6-0.el7
      services:
          ----------
          ceph-mgr.node2:
              ----------
              cluster:
                  ceph
              fsid:
                  47071b01-394e-4a62-bb2d-cfe3c19637f7
              id:
                  node2
              status:
                  None
              type:
                  mgr
              version:
                  13.2.6
          ceph-osd.0:
              ----------
              cluster:
                  ceph
              fsid:
                  47071b01-394e-4a62-bb2d-cfe3c19637f7
              id:
                  0
              status:
                  None
              type:
                  osd
              version:
                  13.2.6
    |_
      ----------
……
  • visit homepage,it barely worked: snipaste_20191106_102957 I will continue to try to track the issue, and if there is progress later, I will leave a record here.😊

imoyao avatar Nov 06 '19 02:11 imoyao

I find some issues here:

tailf /var/log/calamari/calamari.log 

I got this:

2019-11-05 21:02:19,605 - metric_access - django.request No graphite data for ceph.cluster.47071b01-394e-4a62-bb2d-cfe3c19637f7.df.total_used_bytes
2019-11-05 21:02:19,606 - metric_access - django.request No graphite data for ceph.cluster.47071b01-394e-4a62-bb2d-cfe3c19637f7.df.total_used
2019-11-05 21:02:19,606 - metric_access - django.request No graphite data for ceph.cluster.47071b01-394e-4a62-bb2d-cfe3c19637f7.df.total_space
2019-11-05 21:02:19,607 - metric_access - django.request No graphite data for ceph.cluster.47071b01-394e-4a62-bb2d-cfe3c19637f7.df.total_avail
2019-11-05 21:02:19,608 - ERROR - django.request Internal Server Error: /api/v1/cluster/47071b01-394e-4a62-bb2d-cfe3c19637f7/space
Traceback (most recent call last):
  File "/opt/calamari/venv/lib/python2.7/site-packages/django/core/handlers/base.py", line 115, in get_response
    response = callback(request, *callback_args, **callback_kwargs)
  File "/opt/calamari/venv/lib/python2.7/site-packages/rest_framework/viewsets.py", line 78, in view
    return self.dispatch(request, *args, **kwargs)
  File "/opt/calamari/venv/lib/python2.7/site-packages/calamari_rest_api-0.1-py2.7.egg/calamari_rest/views/rpc_view.py", line 94, in dispatch
    self.client.close()
  File "/opt/calamari/venv/lib/python2.7/site-packages/zerorpc/core.py", line 293, in close
    SocketBase.close(self)
  File "/opt/calamari/venv/lib/python2.7/site-packages/zerorpc/socket.py", line 37, in close
    self._events.close()
  File "/opt/calamari/venv/lib/python2.7/site-packages/zerorpc/events.py", line 198, in close
    self._send.close()
  File "/opt/calamari/venv/lib/python2.7/site-packages/zerorpc/events.py", line 50, in close
    self._send_task.kill()
  File "/opt/calamari/venv/lib/python2.7/site-packages/gevent/greenlet.py", line 235, in kill
    waiter.get()
  File "/opt/calamari/venv/lib/python2.7/site-packages/gevent/hub.py", line 575, in get
    return self.hub.switch()
  File "/opt/calamari/venv/lib/python2.7/site-packages/gevent/hub.py", line 338, in switch
    return greenlet.switch(self)
LostRemote: Lost remote after 10s heartbeat

------

2019-11-05 23:59:43,586 - ERROR - django.request Internal Server Error: /api/v1/cluster/47071b01-394e-4a62-bb2d-cfe3c19637f7/health_counters
Traceback (most recent call last):
  File "/opt/calamari/venv/lib/python2.7/site-packages/django/core/handlers/base.py", line 115, in get_response
    response = callback(request, *callback_args, **callback_kwargs)
  File "/opt/calamari/venv/lib/python2.7/site-packages/rest_framework/viewsets.py", line 78, in view
    return self.dispatch(request, *args, **kwargs)
  File "/opt/calamari/venv/lib/python2.7/site-packages/calamari_rest_api-0.1-py2.7.egg/calamari_rest/views/rpc_view.py", line 91, in dispatch
    return super(RPCViewSet, self).dispatch(request, *args, **kwargs)
  File "/opt/calamari/venv/lib/python2.7/site-packages/django/views/decorators/csrf.py", line 77, in wrapped_view
    return view_func(*args, **kwargs)
  File "/opt/calamari/venv/lib/python2.7/site-packages/rest_framework/views.py", line 399, in dispatch
    response = self.handle_exception(exc)
  File "/opt/calamari/venv/lib/python2.7/site-packages/calamari_rest_api-0.1-py2.7.egg/calamari_rest/views/rpc_view.py", line 108, in handle_exception
    return super(RPCViewSet, self).handle_exception(exc)
  File "/opt/calamari/venv/lib/python2.7/site-packages/rest_framework/views.py", line 396, in dispatch
    response = handler(request, *args, **kwargs)
  File "/opt/calamari/venv/lib/python2.7/site-packages/calamari_rest_api-0.1-py2.7.egg/calamari_rest/views/v1.py", line 315, in get
    counters = self.generate(osd_data, mds_data, mon_status, pg_summary)
  File "/opt/calamari/venv/lib/python2.7/site-packages/calamari_rest_api-0.1-py2.7.egg/calamari_rest/views/v1.py", line 167, in generate
    'mds': cls._calculate_mds_counters(mds_map),
  File "/opt/calamari/venv/lib/python2.7/site-packages/calamari_rest_api-0.1-py2.7.egg/calamari_rest/views/v1.py", line 295, in _calculate_mds_counters
    up = len(mds_map['up'])
TypeError: 'NoneType' object has no attribute '__getitem__'

--------

2019-11-06 00:00:53,567 - ERROR - django.request Internal Server Error: /api/v1/cluster/47071b01-394e-4a62-bb2d-cfe3c19637f7/osd
Traceback (most recent call last):
  File "/opt/calamari/venv/lib/python2.7/site-packages/django/core/handlers/base.py", line 115, in get_response
    response = callback(request, *callback_args, **callback_kwargs)
  File "/opt/calamari/venv/lib/python2.7/site-packages/rest_framework/viewsets.py", line 78, in view
    return self.dispatch(request, *args, **kwargs)
  File "/opt/calamari/venv/lib/python2.7/site-packages/calamari_rest_api-0.1-py2.7.egg/calamari_rest/views/rpc_view.py", line 91, in dispatch
    return super(RPCViewSet, self).dispatch(request, *args, **kwargs)
  File "/opt/calamari/venv/lib/python2.7/site-packages/django/views/decorators/csrf.py", line 77, in wrapped_view
    return view_func(*args, **kwargs)
  File "/opt/calamari/venv/lib/python2.7/site-packages/rest_framework/views.py", line 399, in dispatch
    response = self.handle_exception(exc)
  File "/opt/calamari/venv/lib/python2.7/site-packages/calamari_rest_api-0.1-py2.7.egg/calamari_rest/views/rpc_view.py", line 108, in handle_exception
    return super(RPCViewSet, self).handle_exception(exc)
  File "/opt/calamari/venv/lib/python2.7/site-packages/rest_framework/views.py", line 396, in dispatch
    response = handler(request, *args, **kwargs)
  File "/opt/calamari/venv/lib/python2.7/site-packages/calamari_rest_api-0.1-py2.7.egg/calamari_rest/views/v1.py", line 417, in get
    osds, osds_by_pg_state = self.generate(pg_summary, osd_map, server_info, servers)
  File "/opt/calamari/venv/lib/python2.7/site-packages/calamari_rest_api-0.1-py2.7.egg/calamari_rest/views/v1.py", line 365, in generate
    for pool_id, osds in osd_map.osds_by_pool.items():
  File "/opt/calamari/venv/lib/python2.7/site-packages/calamari_common-0.1-py2.7.egg/calamari_common/util.py", line 8, in wrapper
    rv = function(*args)
  File "/opt/calamari/venv/lib/python2.7/site-packages/calamari_common-0.1-py2.7.egg/calamari_common/types.py", line 206, in osds_by_pool
    for rule in [r for r in self.data['crush']['rules'] if r['ruleset'] == pool['crush_ruleset']]:
KeyError: 'crush_ruleset'
2019-11-06 00:00:54,566 - metric_access - django.request No graphite data for ceph.cluster.47071b01-394e-4a62-bb2d-cfe3c19637f7.pool.1.num_objects

then i change code here:vi /opt/calamari/venv/lib/python2.7/site-packages/calamari_rest_api-0.1-py2.7.egg/calamari_rest/views/v1.py +295

@classmethod
def _calculate_mds_counters(cls, mds_map):
    log.debug("_calculate_mds_counters %s" % mds_map)
    if mds_map is not None:
        up = len(mds_map['up'])
        inn = len(mds_map['in'])
        total = len(mds_map['info'])
    else:  # codes here is informal
        total = 0
        inn = 0
        up = 0

    return {
        'total': total,
        'up_in': inn,
        'up_not_in': up - inn,
        'not_up_not_in': total - up,
    }

and code here vi /opt/calamari/venv/lib/python2.7/site-packages/calamari_common-0.1-py2.7.egg/calamari_common/types.py +206:

@property
@memoize
def osds_by_pool(self):
    """
    Get the OSDS which may be used in this pool

    :return dict of pool ID to OSD IDs in the pool
    """

    result = {}
    for pool_id, pool in self.pools_by_id.items():
        osds = None
        if pool and pool.get('crush_ruleset', None):
            for rule in [r for r in self.data['crush']['rules'] if r['ruleset'] == pool['crush_ruleset']]:
                if rule['min_size'] <= pool['size'] <= rule['max_size']:
                    osds = self.osds_by_rule_id[rule['rule_id']]

after then,I restart the server salt-minion: snipaste_20191106_145209 Maybe there are things that are not satisfactory,i will go on to resolve them.

imoyao avatar Nov 06 '19 06:11 imoyao