collectd-ceph icon indicating copy to clipboard operation
collectd-ceph copied to clipboard

CentOS 7 errors on collectd start using ceph_pool_plugin

Open steve--d opened this issue 9 years ago • 14 comments

After starting collectd running on CentOS 7, (ceph giant and now upgraded to hammer) I'm getting the following log errors using the ceph_pool_plugin.

-- Unit collectd.service has begun starting up.
Apr 15 15:04:18 ceph1.domain systemd[1]: Started Collectd statistics daemon.
-- Subject: Unit collectd.service has finished start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit collectd.service has finished starting up.
-- 
-- The start-up result is done.
Apr 15 15:04:18 ceph1.domain collectd[22862]: Initialization complete, entering read-loop.
Apr 15 15:04:18 ceph1.domain python[22874]: detected unhandled Python exception in '/usr/bin/ceph'
Apr 15 15:04:18 ceph1.domain abrt-server[22881]: Package 'ceph-common' isn't signed with proper key
Apr 15 15:04:18 ceph1.domain abrt-server[22881]: 'post-create' on '/var/tmp/abrt/Python-2015-04-15-15:04:18-22874' exited with 1
Apr 15 15:04:18 ceph1.domain abrt-server[22881]: Deleting problem directory '/var/tmp/abrt/Python-2015-04-15-15:04:18-22874'
Apr 15 15:04:18 ceph1.domain collectd[22862]: Traceback (most recent call last):
Apr 15 15:04:18 ceph1.domain collectd[22862]: File "/usr/bin/ceph", line 896, in <module>
Apr 15 15:04:18 ceph1.domain collectd[22862]: retval = main()
Apr 15 15:04:18 ceph1.domain collectd[22862]: File "/usr/bin/ceph", line 647, in main
Apr 15 15:04:18 ceph1.domain collectd[22862]: conffile=conffile)
Apr 15 15:04:18 ceph1.domain collectd[22862]: File "/usr/lib/python2.7/site-packages/rados.py", line 212, in __init__
Apr 15 15:04:18 ceph1.domain collectd[22862]: library_path  = find_library('rados')
Apr 15 15:04:18 ceph1.domain collectd[22862]: File "/usr/lib64/python2.7/ctypes/util.py", line 244, in find_library
Apr 15 15:04:18 ceph1.domain collectd[22862]: return _findSoname_ldconfig(name) or _get_soname(_findLib_gcc(name))
Apr 15 15:04:18 ceph1.domain collectd[22862]: File "/usr/lib64/python2.7/ctypes/util.py", line 237, in _findSoname_ldconfig
Apr 15 15:04:18 ceph1.domain collectd[22862]: f.close()
Apr 15 15:04:18 ceph1.domain collectd[22862]: IOError: [Errno 10] No child processes
Apr 15 15:04:18 ceph1.domain python[22884]: detected unhandled Python exception in '/usr/bin/ceph'
Apr 15 15:04:18 ceph1.domain abrt-server[22891]: Not saving repeating crash in '/usr/bin/ceph'
Apr 15 15:04:18 ceph1.domain collectd[22862]: Traceback (most recent call last):
Apr 15 15:04:18 ceph1.domain collectd[22862]: File "/usr/bin/ceph", line 896, in <module>
Apr 15 15:04:18 ceph1.domain collectd[22862]: retval = main()
Apr 15 15:04:18 ceph1.domain collectd[22862]: File "/usr/bin/ceph", line 647, in main
Apr 15 15:04:18 ceph1.domain collectd[22862]: conffile=conffile)
Apr 15 15:04:18 ceph1.domain collectd[22862]: File "/usr/lib/python2.7/site-packages/rados.py", line 212, in __init__
Apr 15 15:04:18 ceph1.domain collectd[22862]: library_path  = find_library('rados')
Apr 15 15:04:18 ceph1.domain collectd[22862]: File "/usr/lib64/python2.7/ctypes/util.py", line 244, in find_library
Apr 15 15:04:18 ceph1.domain collectd[22862]: return _findSoname_ldconfig(name) or _get_soname(_findLib_gcc(name))
Apr 15 15:04:18 ceph1.domain collectd[22862]: File "/usr/lib64/python2.7/ctypes/util.py", line 237, in _findSoname_ldconfig
Apr 15 15:04:18 ceph1.domain collectd[22862]: f.close()
Apr 15 15:04:18 ceph1.domain collectd[22862]: IOError: [Errno 10] No child processes
Apr 15 15:04:18 ceph1.domain collectd[22862]: ceph: failed to get stats :: No JSON object could be decoded :: Traceback (most recent call last):
                                                      File "/usr/lib64/collectd/base.py", line 114, in read_callback
                                                        stats = self.get_stats()
                                                      File "/usr/lib64/collectd/ceph_pool_plugin.py", line 67, in get_stats
                                                        json_stats_data = json.loads(stats_output)
                                                      File "/usr/lib64/python2.7/json/__init__.py", line 338, in loads
                                                        return _default_decoder.decode(s)
                                                      File "/usr/lib64/python2.7/json/decoder.py", line 365, in decode
                                                        obj, end = self.raw_decode(s, idx=_w(s, 0).end())
                                                      File "/usr/lib64/python2.7/json/decoder.py", line 383, in raw_decode
                                                        raise ValueError("No JSON object could be decoded")
                                                    ValueError: No JSON object could be decoded
Apr 15 15:04:18 ceph1.domain collectd[22862]: Unhandled python exception in read callback: UnboundLocalError: local variable 'stats' referenced before assignment
Apr 15 15:04:18 ceph1.domain collectd[22862]: read-function of plugin `python.ceph_pool_plugin' failed. Will suspend it for 120.000 seconds.

collectd.conf:

<LoadPlugin python>
  Globals true
</LoadPlugin>

<Plugin "python">
    ModulePath "/usr/lib64/collectd"

    Import "ceph_pool_plugin"

    <Module "ceph_pool_plugin">
        Verbose "True"
        Cluster "ceph"
        Interval "60"
        TestPool "rbd"
    </Module>
</Plugin>

steve--d avatar Apr 15 '15 22:04 steve--d

I also see this error on RHEL 7.1

[root@tapir2 python]# service collectd status Redirecting to /bin/systemctl status collectd.service collectd.service - Collectd statistics daemon Loaded: loaded (/usr/lib/systemd/system/collectd.service; enabled) Active: active (running) since Fri 2015-05-08 13:08:01 BST; 2min 42s ago Docs: man:collectd(1) man:collectd.conf(5) Main PID: 18995 (collectd) CGroup: /system.slice/collectd.service └─18995 /usr/sbin/collectd -C /etc/collectd.conf -f

May 08 13:08:01 tapir2.eng.velocix.com systemd[1]: Started Collectd statistics daemon. May 08 13:08:01 tapir2.eng.velocix.com collectd[18995]: Initialization complete, entering read-loop. May 08 13:08:01 tapir2.eng.velocix.com collectd[18995]: Unhandled python exception in read callback: TypeError: Dataset mutex-JOS::ApplyManager::apply_lock not found May 08 13:08:01 tapir2.eng.velocix.com collectd[18995]: read-function of plugin python.ceph' failed. Will suspend it for 20.000 seconds. May 08 13:08:21 tapir2.eng.velocix.com collectd[18995]: Unhandled python exception in read callback: TypeError: Dataset mutex-JOS::ApplyManager::apply_lock not found May 08 13:08:21 tapir2.eng.velocix.com collectd[18995]: read-function of pluginpython.ceph' failed. Will suspend it for 40.000 seconds. May 08 13:09:01 tapir2.eng.velocix.com collectd[18995]: Unhandled python exception in read callback: TypeError: Dataset mutex-JOS::ApplyManager::apply_lock not found

my collectd.conf

Globals true ModulePath "/usr/lib64/collectd/python" Import "ceph"
<Module ceph>
    AdminSocket "/var/run/ceph/ceph-*.asok"
</Module>

TypesDB "/usr/share/collectd/types.db" "/usr/lib64/collectd/python/ceph.types.db"

brynmathias avatar May 08 '15 12:05 brynmathias

I've the same problem, have you find a workaround ?

solune avatar Jul 03 '15 09:07 solune

Hi i have also same problem for Rhel 7.1 and Ceph Hammer release, does anyone has fix/workaround for this problem?

ozhanka avatar Jul 07 '15 08:07 ozhanka

I should be able to have a look next week.

rochaporto avatar Jul 09 '15 08:07 rochaporto

I am facing exactly the same issue [error] Unhandled python exception in read callback: UnboundLocalError: local variable 'stats' referenced before assignment

Collectd Logs

[2015-07-20 11:30:29] [info] ceph: collectd new data from service :: took 0 seconds
[2015-07-20 11:30:30] [error] ceph: failed to get stats :: Expecting object: line 2 column 124 (char 124) :: Traceback (most recent call last):
  File "/etc/collectd/plugins/ceph/base.py", line 114, in read_callback
    stats = self.get_stats()
  File "/etc/collectd/plugins/ceph/ceph_pool_plugin.py", line 72, in get_stats
    json_stats_data = json.loads(stats_output)
  File "/usr/lib64/python2.6/json/__init__.py", line 307, in loads
    return _default_decoder.decode(s)
  File "/usr/lib64/python2.6/json/decoder.py", line 319, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib64/python2.6/json/decoder.py", line 336, in raw_decode
    obj, end = self._scanner.iterscan(s, **kw).next()
  File "/usr/lib64/python2.6/json/scanner.py", line 55, in iterscan
    rval, next_pos = action(m, context)
  File "/usr/lib64/python2.6/json/decoder.py", line 217, in JSONArray
    value, end = iterscan(s, idx=end, context=context).next()
  File "/usr/lib64/python2.6/json/scanner.py", line 55, in iterscan
    rval, next_pos = ac
[2015-07-20 11:30:30] [error] Unhandled python exception in read callback: UnboundLocalError: local variable 'stats' referenced before assignment
[2015-07-20 11:30:30] [notice] read-function of plugin `python.ceph_pool_plugin' failed. Will suspend it for 240.000 seconds.
[2015-07-20 11:30:41] [info] ceph: collectd new data from service :: took 13 seconds

Did anyone managed to fix this.

@rochaporto Do you have time to check this , appreciate your help.

ksingh7 avatar Jul 20 '15 08:07 ksingh7

I'm having the same issue here. Seems like the origin is there:

Traceback (most recent call last):
  File "/usr/bin/ceph", line 896, in <module>
    retval = main()
  File "/usr/bin/ceph", line 647, in main
    conffile=conffile)
  File "/usr/lib/python2.7/site-packages/rados.py", line 212, in __init__
    library_path  = find_library('rados')
  File "/usr/lib64/python2.7/ctypes/util.py", line 244, in find_library
    return _findSoname_ldconfig(name) or _get_soname(_findLib_gcc(name))
  File "/usr/lib64/python2.7/ctypes/util.py", line 237, in _findSoname_ldconfig
    f.close()
IOError: [Errno 10] No child processes

gcmalloc avatar Aug 10 '15 09:08 gcmalloc

Any news? I'm having the same issue for Ubuntu 14.04 and Ceph Hammer release:

Aug 21 00:07:54 collectd collectd[17115]: ceph: failed to get stats :: No JSON object could be decoded :: Traceback (most recent call last):#012 File "/usr/lib/collectd/plugins/ceph/base.py", line 108, in read_callback#012 stats = self.get_stats()#012 File "/usr/lib/collectd/plugins/ceph/ceph_pool_plugin.py", line 67, in get_stats#012 json_stats_data = json.loads(stats_output)#012 File "/usr/lib/python2.7/json/init.py", line 338, in loads#012 return _default_decoder.decode(s)#012 File "/usr/lib/python2.7/json/decoder.py", line 366, in decode#012 obj, end = self.raw_decode(s, idx=_w(s, 0).end())#012 File "/usr/lib/python2.7/json/decoder.py", line 384, in raw_decode#012 raise ValueError("No JSON object could be decoded")#012ValueError: No JSON object could be decoded Aug 21 00:07:54 collectd collectd[17115]: Unhandled python exception in read callback: UnboundLocalError: local variable 'stats' referenced before assignment Aug 21 00:07:54 collectd collectd[17115]: read-function of plugin `python.ceph_pool_plugin' failed. Will suspend it for 20.000 seconds.

roadracer avatar Aug 20 '15 21:08 roadracer

One of you has succeed to make it works ? Another ceph ceph -- collectd plugin ?

solune avatar Oct 13 '15 08:10 solune

Hello!

This note described in a man page:

  • https://collectd.org/documentation/manpages/collectd-python.5.shtml

You may put getsigchld.py in scripts folder and insert the line to a configuration:

<Plugin "python"> 
  ModulePath [..]
  Import "getsigchld"

yashumitsu avatar Nov 30 '15 08:11 yashumitsu

it works better yashumitsu !

but now there is a new error: Nov 30 20:33:05 cephrr1n4 collectd[19331]: ceph: failed to get stats :: list index out of range :: Traceback (most recent call last): File "/opt/collectd-ceph/git/collectd-ceph/plugins/base.py", line 114, in read_callback stats = self.get_stats() File "/opt/collectd-ceph/git/collectd-ceph/plugins/ceph_latency_plugin.py", line 67, in get_stats data[ceph_cluster]['cluster']['stddev_latency'] = results[1] IndexError: list index out of range Nov 30 20:33:05 cephrr1n4 collectd[19331]: Unhandled python exception in read callback: UnboundLocalError: local variable 'stats' referenced before assignment Nov 30 20:33:05 cephrr1n4 collectd[19331]: read-function of plugin `python.ceph_latency_plugin' failed. Will suspend it for 120.000 seconds.

solune avatar Nov 30 '15 19:11 solune

No thanks necessary!

The easiest way to get it works is to change default pool name (data) to another pool, which is exists:

  • https://github.com/rochaporto/collectd-ceph/blob/master/plugins/ceph_latency_plugin.py#L54

yashumitsu avatar Dec 01 '15 14:12 yashumitsu

It works! Thanks

solune avatar Dec 01 '15 14:12 solune

with strace we can see that getsigchld.py
so try to copy getsigchld.py cp collectd-5.5.0/contrib/python/getsigchld.py /usr/lib64/python2.7/site-packages/

mourgaya avatar Dec 07 '15 11:12 mourgaya

Thanks for posting this fix.

benh57 avatar Dec 23 '15 06:12 benh57