collectd-ceph
collectd-ceph copied to clipboard
CentOS 7 errors on collectd start using ceph_pool_plugin
After starting collectd running on CentOS 7, (ceph giant and now upgraded to hammer) I'm getting the following log errors using the ceph_pool_plugin.
-- Unit collectd.service has begun starting up.
Apr 15 15:04:18 ceph1.domain systemd[1]: Started Collectd statistics daemon.
-- Subject: Unit collectd.service has finished start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit collectd.service has finished starting up.
--
-- The start-up result is done.
Apr 15 15:04:18 ceph1.domain collectd[22862]: Initialization complete, entering read-loop.
Apr 15 15:04:18 ceph1.domain python[22874]: detected unhandled Python exception in '/usr/bin/ceph'
Apr 15 15:04:18 ceph1.domain abrt-server[22881]: Package 'ceph-common' isn't signed with proper key
Apr 15 15:04:18 ceph1.domain abrt-server[22881]: 'post-create' on '/var/tmp/abrt/Python-2015-04-15-15:04:18-22874' exited with 1
Apr 15 15:04:18 ceph1.domain abrt-server[22881]: Deleting problem directory '/var/tmp/abrt/Python-2015-04-15-15:04:18-22874'
Apr 15 15:04:18 ceph1.domain collectd[22862]: Traceback (most recent call last):
Apr 15 15:04:18 ceph1.domain collectd[22862]: File "/usr/bin/ceph", line 896, in <module>
Apr 15 15:04:18 ceph1.domain collectd[22862]: retval = main()
Apr 15 15:04:18 ceph1.domain collectd[22862]: File "/usr/bin/ceph", line 647, in main
Apr 15 15:04:18 ceph1.domain collectd[22862]: conffile=conffile)
Apr 15 15:04:18 ceph1.domain collectd[22862]: File "/usr/lib/python2.7/site-packages/rados.py", line 212, in __init__
Apr 15 15:04:18 ceph1.domain collectd[22862]: library_path = find_library('rados')
Apr 15 15:04:18 ceph1.domain collectd[22862]: File "/usr/lib64/python2.7/ctypes/util.py", line 244, in find_library
Apr 15 15:04:18 ceph1.domain collectd[22862]: return _findSoname_ldconfig(name) or _get_soname(_findLib_gcc(name))
Apr 15 15:04:18 ceph1.domain collectd[22862]: File "/usr/lib64/python2.7/ctypes/util.py", line 237, in _findSoname_ldconfig
Apr 15 15:04:18 ceph1.domain collectd[22862]: f.close()
Apr 15 15:04:18 ceph1.domain collectd[22862]: IOError: [Errno 10] No child processes
Apr 15 15:04:18 ceph1.domain python[22884]: detected unhandled Python exception in '/usr/bin/ceph'
Apr 15 15:04:18 ceph1.domain abrt-server[22891]: Not saving repeating crash in '/usr/bin/ceph'
Apr 15 15:04:18 ceph1.domain collectd[22862]: Traceback (most recent call last):
Apr 15 15:04:18 ceph1.domain collectd[22862]: File "/usr/bin/ceph", line 896, in <module>
Apr 15 15:04:18 ceph1.domain collectd[22862]: retval = main()
Apr 15 15:04:18 ceph1.domain collectd[22862]: File "/usr/bin/ceph", line 647, in main
Apr 15 15:04:18 ceph1.domain collectd[22862]: conffile=conffile)
Apr 15 15:04:18 ceph1.domain collectd[22862]: File "/usr/lib/python2.7/site-packages/rados.py", line 212, in __init__
Apr 15 15:04:18 ceph1.domain collectd[22862]: library_path = find_library('rados')
Apr 15 15:04:18 ceph1.domain collectd[22862]: File "/usr/lib64/python2.7/ctypes/util.py", line 244, in find_library
Apr 15 15:04:18 ceph1.domain collectd[22862]: return _findSoname_ldconfig(name) or _get_soname(_findLib_gcc(name))
Apr 15 15:04:18 ceph1.domain collectd[22862]: File "/usr/lib64/python2.7/ctypes/util.py", line 237, in _findSoname_ldconfig
Apr 15 15:04:18 ceph1.domain collectd[22862]: f.close()
Apr 15 15:04:18 ceph1.domain collectd[22862]: IOError: [Errno 10] No child processes
Apr 15 15:04:18 ceph1.domain collectd[22862]: ceph: failed to get stats :: No JSON object could be decoded :: Traceback (most recent call last):
File "/usr/lib64/collectd/base.py", line 114, in read_callback
stats = self.get_stats()
File "/usr/lib64/collectd/ceph_pool_plugin.py", line 67, in get_stats
json_stats_data = json.loads(stats_output)
File "/usr/lib64/python2.7/json/__init__.py", line 338, in loads
return _default_decoder.decode(s)
File "/usr/lib64/python2.7/json/decoder.py", line 365, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib64/python2.7/json/decoder.py", line 383, in raw_decode
raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded
Apr 15 15:04:18 ceph1.domain collectd[22862]: Unhandled python exception in read callback: UnboundLocalError: local variable 'stats' referenced before assignment
Apr 15 15:04:18 ceph1.domain collectd[22862]: read-function of plugin `python.ceph_pool_plugin' failed. Will suspend it for 120.000 seconds.
collectd.conf:
<LoadPlugin python>
Globals true
</LoadPlugin>
<Plugin "python">
ModulePath "/usr/lib64/collectd"
Import "ceph_pool_plugin"
<Module "ceph_pool_plugin">
Verbose "True"
Cluster "ceph"
Interval "60"
TestPool "rbd"
</Module>
</Plugin>
I also see this error on RHEL 7.1
[root@tapir2 python]# service collectd status Redirecting to /bin/systemctl status collectd.service collectd.service - Collectd statistics daemon Loaded: loaded (/usr/lib/systemd/system/collectd.service; enabled) Active: active (running) since Fri 2015-05-08 13:08:01 BST; 2min 42s ago Docs: man:collectd(1) man:collectd.conf(5) Main PID: 18995 (collectd) CGroup: /system.slice/collectd.service └─18995 /usr/sbin/collectd -C /etc/collectd.conf -f
May 08 13:08:01 tapir2.eng.velocix.com systemd[1]: Started Collectd statistics daemon.
May 08 13:08:01 tapir2.eng.velocix.com collectd[18995]: Initialization complete, entering read-loop.
May 08 13:08:01 tapir2.eng.velocix.com collectd[18995]: Unhandled python exception in read callback: TypeError: Dataset mutex-JOS::ApplyManager::apply_lock not found
May 08 13:08:01 tapir2.eng.velocix.com collectd[18995]: read-function of plugin python.ceph' failed. Will suspend it for 20.000 seconds. May 08 13:08:21 tapir2.eng.velocix.com collectd[18995]: Unhandled python exception in read callback: TypeError: Dataset mutex-JOS::ApplyManager::apply_lock not found May 08 13:08:21 tapir2.eng.velocix.com collectd[18995]: read-function of plugin
python.ceph' failed. Will suspend it for 40.000 seconds.
May 08 13:09:01 tapir2.eng.velocix.com collectd[18995]: Unhandled python exception in read callback: TypeError: Dataset mutex-JOS::ApplyManager::apply_lock not found
my collectd.conf
<Module ceph>
AdminSocket "/var/run/ceph/ceph-*.asok"
</Module>
TypesDB "/usr/share/collectd/types.db" "/usr/lib64/collectd/python/ceph.types.db"
I've the same problem, have you find a workaround ?
Hi i have also same problem for Rhel 7.1 and Ceph Hammer release, does anyone has fix/workaround for this problem?
I should be able to have a look next week.
I am facing exactly the same issue [error] Unhandled python exception in read callback: UnboundLocalError: local variable 'stats' referenced before assignment
Collectd Logs
[2015-07-20 11:30:29] [info] ceph: collectd new data from service :: took 0 seconds
[2015-07-20 11:30:30] [error] ceph: failed to get stats :: Expecting object: line 2 column 124 (char 124) :: Traceback (most recent call last):
File "/etc/collectd/plugins/ceph/base.py", line 114, in read_callback
stats = self.get_stats()
File "/etc/collectd/plugins/ceph/ceph_pool_plugin.py", line 72, in get_stats
json_stats_data = json.loads(stats_output)
File "/usr/lib64/python2.6/json/__init__.py", line 307, in loads
return _default_decoder.decode(s)
File "/usr/lib64/python2.6/json/decoder.py", line 319, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib64/python2.6/json/decoder.py", line 336, in raw_decode
obj, end = self._scanner.iterscan(s, **kw).next()
File "/usr/lib64/python2.6/json/scanner.py", line 55, in iterscan
rval, next_pos = action(m, context)
File "/usr/lib64/python2.6/json/decoder.py", line 217, in JSONArray
value, end = iterscan(s, idx=end, context=context).next()
File "/usr/lib64/python2.6/json/scanner.py", line 55, in iterscan
rval, next_pos = ac
[2015-07-20 11:30:30] [error] Unhandled python exception in read callback: UnboundLocalError: local variable 'stats' referenced before assignment
[2015-07-20 11:30:30] [notice] read-function of plugin `python.ceph_pool_plugin' failed. Will suspend it for 240.000 seconds.
[2015-07-20 11:30:41] [info] ceph: collectd new data from service :: took 13 seconds
Did anyone managed to fix this.
@rochaporto Do you have time to check this , appreciate your help.
I'm having the same issue here. Seems like the origin is there:
Traceback (most recent call last):
File "/usr/bin/ceph", line 896, in <module>
retval = main()
File "/usr/bin/ceph", line 647, in main
conffile=conffile)
File "/usr/lib/python2.7/site-packages/rados.py", line 212, in __init__
library_path = find_library('rados')
File "/usr/lib64/python2.7/ctypes/util.py", line 244, in find_library
return _findSoname_ldconfig(name) or _get_soname(_findLib_gcc(name))
File "/usr/lib64/python2.7/ctypes/util.py", line 237, in _findSoname_ldconfig
f.close()
IOError: [Errno 10] No child processes
Any news? I'm having the same issue for Ubuntu 14.04 and Ceph Hammer release:
Aug 21 00:07:54 collectd collectd[17115]: ceph: failed to get stats :: No JSON object could be decoded :: Traceback (most recent call last):#012 File "/usr/lib/collectd/plugins/ceph/base.py", line 108, in read_callback#012 stats = self.get_stats()#012 File "/usr/lib/collectd/plugins/ceph/ceph_pool_plugin.py", line 67, in get_stats#012 json_stats_data = json.loads(stats_output)#012 File "/usr/lib/python2.7/json/init.py", line 338, in loads#012 return _default_decoder.decode(s)#012 File "/usr/lib/python2.7/json/decoder.py", line 366, in decode#012 obj, end = self.raw_decode(s, idx=_w(s, 0).end())#012 File "/usr/lib/python2.7/json/decoder.py", line 384, in raw_decode#012 raise ValueError("No JSON object could be decoded")#012ValueError: No JSON object could be decoded Aug 21 00:07:54 collectd collectd[17115]: Unhandled python exception in read callback: UnboundLocalError: local variable 'stats' referenced before assignment Aug 21 00:07:54 collectd collectd[17115]: read-function of plugin `python.ceph_pool_plugin' failed. Will suspend it for 20.000 seconds.
One of you has succeed to make it works ? Another ceph ceph -- collectd plugin ?
Hello!
This note described in a man page:
- https://collectd.org/documentation/manpages/collectd-python.5.shtml
You may put getsigchld.py in scripts folder and insert the line to a configuration:
<Plugin "python">
ModulePath [..]
Import "getsigchld"
it works better yashumitsu !
but now there is a new error: Nov 30 20:33:05 cephrr1n4 collectd[19331]: ceph: failed to get stats :: list index out of range :: Traceback (most recent call last): File "/opt/collectd-ceph/git/collectd-ceph/plugins/base.py", line 114, in read_callback stats = self.get_stats() File "/opt/collectd-ceph/git/collectd-ceph/plugins/ceph_latency_plugin.py", line 67, in get_stats data[ceph_cluster]['cluster']['stddev_latency'] = results[1] IndexError: list index out of range Nov 30 20:33:05 cephrr1n4 collectd[19331]: Unhandled python exception in read callback: UnboundLocalError: local variable 'stats' referenced before assignment Nov 30 20:33:05 cephrr1n4 collectd[19331]: read-function of plugin `python.ceph_latency_plugin' failed. Will suspend it for 120.000 seconds.
No thanks necessary!
The easiest way to get it works is to change default pool name (data) to another pool, which is exists:
- https://github.com/rochaporto/collectd-ceph/blob/master/plugins/ceph_latency_plugin.py#L54
It works! Thanks
with strace we can see that getsigchld.py
so try to copy getsigchld.py
cp collectd-5.5.0/contrib/python/getsigchld.py /usr/lib64/python2.7/site-packages/
Thanks for posting this fix.