calamari
calamari copied to clipboard
Diamond Ceph Stats not received in calamari
Everything is working except the ceph and pool graph stats in the calamari gui, the host stats are working fine
root@calamari:~# dpkg -l | egrep -i "calamari|salt" | awk '{print $2 "\t\t" $3}'
calamari-clients 1.3.1.1-1trusty
calamari-server 1.3.0.1-11-g9fb65ae
salt-common 2014.7.5+ds-1ubuntu1
salt-master 2014.7.5+ds-1ubuntu1
salt-minion 2014.7.5+ds-1ubuntu1
root@calamari:~#
root@ceph1:~# dpkg -l | egrep -i "ceph|salt|diamond" | awk '{print $2 "\t\t" $3}'
ceph 9.2.0-1trusty
ceph-common 9.2.0-1trusty
ceph-mds 9.2.0-1trusty
diamond 3.4.67
libcephfs1 9.2.0-1trusty
python-cephfs 9.2.0-1trusty
python-rados 9.2.0-1trusty
python-rbd 9.2.0-1trusty
salt-common 0.17.5+ds-1
salt-minion 0.17.5+ds-1
root@ceph1:~#
Let me know what more I should be checking
Regards, Daniel
Note I've build the latest git stable deb packages via vagrant, still with the same issue
root@calamari:~# dpkg -l | egrep -i "calamari|salt|romana" | awk '{print $2 "\t\t" $3}'
calamari-server 1.3.1.1-105-g79c8df2-1trusty
romana 1.2.2-36-gc62bb5b
salt-common 2014.7.5+ds-1ubuntu1
salt-master 2014.7.5+ds-1ubuntu1
salt-minion 2014.7.5+ds-1ubuntu1
root@calamari:~#
Also on the client I've matched the salt versions which is recommended
root@ceph1:~# dpkg -l | egrep -i "salt|diamond" | awk '{print $2 "\t\t" $3}'
diamond 3.4.67
salt-common 2014.7.5+ds-1ubuntu1
salt-minion 2014.7.5+ds-1ubuntu1
root@ceph1:~#
Doing a server diamond restart show the below:
root@ceph1:~# tail -f /var/log/diamond/diamond.log
[2016-01-24 04:19:21,039] [MainThread] pysnmp.entity.rfc3413.oneliner.cmdgen failed to load
[2016-01-24 04:19:21,043] [MainThread] pysnmp.entity.rfc3413.oneliner.cmdgen failed to load
[2016-01-24 04:19:21,044] [MainThread] pysnmp.entity.rfc3413.oneliner.cmdgen failed to load
[2016-01-24 04:19:21,046] [MainThread] pysnmp.entity.rfc3413.oneliner.cmdgen failed to load
[2016-01-24 04:19:21,056] [MainThread] pysnmp.entity.rfc3413.oneliner.cmdgen failed to load
[2016-01-24 04:19:21,074] [MainThread] pysnmp.entity.rfc3413.oneliner.cmdgen failed to load
[2016-01-24 04:19:22,252] [Thread-1] Traceback (most recent call last):
File "/usr/lib/pymodules/python2.7/diamond/collector.py", line 412, in _run
self.collect()
File "/usr/share/diamond/collectors/ceph/ceph.py", line 464, in collect
self._collect_service_stats(path)
File "/usr/share/diamond/collectors/ceph/ceph.py", line 450, in _collect_service_stats
self._publish_stats(counter_prefix, stats, schema, GlobalName)
File "/usr/share/diamond/collectors/ceph/ceph.py", line 305, in _publish_stats
assert path[-1] == 'type'
AssertionError
^C
root@ceph1:~# md5sum /usr/lib/pymodules/python2.7/diamond/collector.py
08bb05a483fa3d1d64c0ebf690259a05 /usr/lib/pymodules/python2.7/diamond/collector.py
root@ceph1:~# md5sum /usr/share/diamond/collectors/ceph/ceph.py
aeb3915f8ac7fdea61495805d2c99f33 /usr/share/diamond/collectors/ceph/ceph.py
root@ceph1:~#
Looking at the calamari.log I can see it's looking for missing graphite metric data
root@calamari:/var/log/calamari# tail -f calamari.log
2016-01-23 22:44:54,040 - metric_access - django.request No graphite data for ceph.cluster.85895b09-7e2d-4290-b053-e7a71f8b5e08.df.total_space
2016-01-23 22:44:54,041 - metric_access - django.request No graphite data for ceph.cluster.85895b09-7e2d-4290-b053-e7a71f8b5e08.df.total_avail
2016-01-23 22:44:58,560 - metric_access - django.request No graphite data for ceph.cluster.85895b09-7e2d-4290-b053-e7a71f8b5e08.pool.0.num_objects
2016-01-23 22:44:58,561 - metric_access - django.request No graphite data for ceph.cluster.85895b09-7e2d-4290-b053-e7a71f8b5e08.pool.0.num_bytes
2016-01-23 22:44:58,835 - metric_access - django.request No graphite data for ceph.cluster.85895b09-7e2d-4290-b053-e7a71f8b5e08.df.total_used_bytes
2016-01-23 22:44:58,835 - metric_access - django.request No graphite data for ceph.cluster.85895b09-7e2d-4290-b053-e7a71f8b5e08.df.total_used
2016-01-23 22:44:58,836 - metric_access - django.request No graphite data for ceph.cluster.85895b09-7e2d-4290-b053-e7a71f8b5e08.df.total_space
2016-01-23 22:44:58,836 - metric_access - django.request No graphite data for ceph.cluster.85895b09-7e2d-4290-b053-e7a71f8b5e08.df.total_avail
2016-01-23 22:44:58,893 - metric_access - django.request No graphite data for ceph.cluster.85895b09-7e2d-4290-b053-e7a71f8b5e08.pool.0.num_objects
2016-01-23 22:44:58,894 - metric_access - django.request No graphite data for ceph.cluster.85895b09-7e2d-4290-b053-e7a71f8b5e08.pool.0.num_bytes
2016-01-23 22:45:14,440 - metric_access - django.request No graphite data for ceph.cluster.85895b09-7e2d-4290-b053-e7a71f8b5e08.df.total_used_bytes
2016-01-23 22:45:14,441 - metric_access - django.request No graphite data for ceph.cluster.85895b09-7e2d-4290-b053-e7a71f8b5e08.df.total_used
2016-01-23 22:45:14,442 - metric_access - django.request No graphite data for ceph.cluster.85895b09-7e2d-4290-b053-e7a71f8b5e08.df.total_space
2016-01-23 22:45:14,442 - metric_access - django.request No graphite data for ceph.cluster.85895b09-7e2d-4290-b053-e7a71f8b5e08.df.total_avail
2016-01-23 22:45:18,373 - metric_access - django.request No graphite data for ceph.cluster.85895b09-7e2d-4290-b053-e7a71f8b5e08.pool.0.num_objects
2016-01-23 22:45:18,377 - metric_access - django.request No graphite data for ceph.cluster.85895b09-7e2d-4290-b053-e7a71f8b5e08.pool.0.num_bytes
2016-01-23 22:45:18,878 - metric_access - django.request No graphite data for ceph.cluster.85895b09-7e2d-4290-b053-e7a71f8b5e08.pool.0.num_objects
2016-01-23 22:45:18,879 - metric_access - django.request No graphite data for ceph.cluster.85895b09-7e2d-4290-b053-e7a71f8b5e08.pool.0.num_bytes
2016-01-23 22:45:19,269 - metric_access - django.request No graphite data for ceph.cluster.85895b09-7e2d-4290-b053-e7a71f8b5e08.df.total_used_bytes
2016-01-23 22:45:19,270 - metric_access - django.request No graphite data for ceph.cluster.85895b09-7e2d-4290-b053-e7a71f8b5e08.df.total_used
2016-01-23 22:45:19,275 - metric_access - django.request No graphite data for ceph.cluster.85895b09-7e2d-4290-b053-e7a71f8b5e08.df.total_space
2016-01-23 22:45:19,276 - metric_access - django.request No graphite data for ceph.cluster.85895b09-7e2d-4290-b053-e7a71f8b5e08.df.total_avail
^C
root@calamari:/var/log/calamari#
I can see the ok files are there
root@ceph1:/var/run/ceph# ls -la
total 0
drwxrwx--- 2 ceph ceph 80 Feb 1 10:51 .
drwxr-xr-x 18 root root 640 Feb 1 10:52 ..
srwxr-xr-x 1 ceph ceph 0 Feb 1 10:51 ceph-mon.ceph1.asok
srwxr-xr-x 1 root root 0 Jan 27 15:08 ceph-osd.0.asok
root@ceph1:/var/run/ceph#
root@ceph1:/var/run/ceph#
root@ceph1:/var/run/ceph#
Running diamond in debug show the below
[2016-02-01 10:55:23,774] [Thread-1] Collecting data from: NetworkCollector
[2016-02-01 10:56:23,484] [Thread-1] Collecting data from: CPUCollector
[2016-02-01 10:56:23,487] [Thread-6] Collecting data from: MemoryCollector
[2016-02-01 10:56:23,489] [Thread-7] Collecting data from: SockstatCollector
[2016-02-01 10:56:23,768] [Thread-1] Collecting data from: CephCollector
[2016-02-01 10:56:23,768] [Thread-1] gathering service stats for /var/run/ceph/ceph-mon.ceph1.asok
[2016-02-01 10:56:24,094] [Thread-1] Traceback (most recent call last):
File "/usr/lib/pymodules/python2.7/diamond/collector.py", line 412, in _run
self.collect()
File "/usr/share/diamond/collectors/ceph/ceph.py", line 464, in collect
self._collect_service_stats(path)
File "/usr/share/diamond/collectors/ceph/ceph.py", line 450, in _collect_service_stats
self._publish_stats(counter_prefix, stats, schema, GlobalName)
File "/usr/share/diamond/collectors/ceph/ceph.py", line 305, in _publish_stats
assert path[-1] == 'type'
AssertionError
[2016-02-01 10:56:24,096] [Thread-8] Collecting data from: LoadAverageCollector
[2016-02-01 10:56:24,098] [Thread-1] Collecting data from: VMStatCollector
[2016-02-01 10:56:24,099] [Thread-1] Collecting data from: DiskUsageCollector
[2016-02-01 10:56:24,104] [Thread-9] Collecting data from: DiskSpaceCollector
Check the md5 on the file returns the below:
root@ceph1:/var/run/ceph# md5sum /usr/share/diamond/collectors/ceph/ceph.py
aeb3915f8ac7fdea61495805d2c99f33 /usr/share/diamond/collectors/ceph/ceph.py
root@ceph1:/var/run/ceph#
I've found that replacing the ceph.py file with the below stops the diamond error
Diamond version 3.4.67
https://raw.githubusercontent.com/BrightcoveOS/Diamond/master/src/collectors/ceph/ceph.py
root@ceph1:/usr/share/diamond/collectors/ceph# md5sum ceph.py
13ac74ce0df39a5def879cb5fc530015 ceph.py
[2016-02-01 11:14:33,116] [Thread-42] Collecting data from: MemoryCollector
[2016-02-01 11:14:33,117] [Thread-1] Collecting data from: CPUCollector
[2016-02-01 11:14:33,123] [Thread-43] Collecting data from: SockstatCollector
[2016-02-01 11:14:35,453] [Thread-1] Collecting data from: CephCollector
[2016-02-01 11:14:35,454] [Thread-1] checking /var/run/ceph/ceph-mon.ceph1.asok
[2016-02-01 11:14:35,552] [Thread-1] checking /var/run/ceph/ceph-osd.0.asok
[2016-02-01 11:14:35,685] [Thread-44] Collecting data from: LoadAverageCollector
[2016-02-01 11:14:35,686] [Thread-1] Collecting data from: VMStatCollector
[2016-02-01 11:14:35,687] [Thread-1] Collecting data from: DiskUsageCollector
[2016-02-01 11:14:35,692] [Thread-45] Collecting data from: DiskSpaceCollector
But after all that it's still not working
Ok Thanks to the below reply on the mailing list
John Spray Mon, 01 Feb 2016 04:23:24 -0800
The "assert path[-1] == 'type'" is the error you get when using the
calamari diamond branch with a >= infernalis version of Ceph (where
new fields were added to the perf schema output). No idea if anyone
has worked on updating Calamari+Diamond for latest ceph.
John
I've downgraded to hammer, now everything is working
I've build the latest calamari server, diamond and new calamari clients (now called romana)
Feel free to use them on your trusty deployments
http://bladeservers.net.au/calamari-server_1.3.1.1-105-g79c8df2-1trusty_amd64.deb http://bladeservers.net.au/romana_1.2.2-36-gc62bb5b_all.deb http://bladeservers.net.au/diamond_3.4.725_all.deb
@drolfe Thanks alot for your packages. In order to make IOPS / usage data appear in graphite / calamaris when running Ceph Infernalis, a small change in the ceph.py collector script is required. See https://github.com/luinnar/Diamond/commit/a9fcc62097b82e1df5cc16fb78bcdebf0ab11d0d
I've been waiting to get this upstream: https://github.com/python-diamond/Diamond/pull/321 that will get you a newer diamond 4.X and fix infernalis