LustrePerfMon icon indicating copy to clipboard operation
LustrePerfMon copied to clipboard

got exception with query [SELECT * FROM "memory.buffered.memory" WHERE fqdn = 'mds' ORDER BY time DESC LIMIT 1;]

Open GodGirlwsy opened this issue 5 years ago • 4 comments

[2020/10/14-14:58:38] [INFO] [filelock.py:247] Lock 140164392906576 acquired on /etc/esmon_install.conf.lock [2020/10/14-14:58:39] [WARNING] [ssh_host.py:232] lsb_release is needed on host [mds] for accurate distro identification [2020/10/14-14:58:40] [INFO] [esmon_install_nodeps.py:1514] can't deterimine Lustre version according to RPM names on host [mds], possible versions are [es2 es3 es4 2.7 2.12], using default [es3] [2020/10/14-14:58:40] [INFO] [esmon_install_nodeps.py:2470] ESMON server won't be reinstalled according to the config [2020/10/14-14:58:40] [INFO] [esmon_install_nodeps.py:2484] support for metrics of [memory, CPU, df(/), load, sensors, uptime, users, Lustre MDS] will be enabled on ESMON client [mds] according to the config [2020/10/14-14:58:51] [INFO] [connectionpool.py:203] Starting new HTTP connection (1): cloudos112 [2020/10/14-14:58:51] [ERROR] [esmon_influxdb.py:60] got exception with query [SELECT * FROM "memory.buffered.memory" WHERE fqdn = 'mds' ORDER BY time DESC LIMIT 1;]: Traceback (most recent call last): File "pyesmon/esmon_influxdb.py", line 57, in ic_query File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 464, in request resp = self.send(prep, **send_kwargs) File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 576, in send r = adapter.send(request, **kwargs) File "/usr/lib/python2.7/site-packages/requests/adapters.py", line 415, in send raise ConnectionError(err, request=request) ConnectionError: ('Connection aborted.', error(111, 'Connection refused'))

GodGirlwsy avatar Oct 14 '20 06:10 GodGirlwsy

It seems http://${INFLUX_SERVER}:8086 can not be connected. I'd suggest to check the connection manually. And firewalld might be blocking the requests.

LiXi-storage avatar Oct 14 '20 11:10 LiXi-storage

It seems http://${INFLUX_SERVER}:8086 can not be connected. I'd suggest to check the connection manually. And firewalld might be blocking the requests. 感谢您的回复,上述问题已经解决,原因是esmon_config中的server:reinstall为false,改为true即可安装成功。

安装成功后登陆grafana界面,查看了lustre的statistis发现有些指标的数据采集失败。查看mds的节点 [root@mds1 ~]# systemctl status collectd.service ● collectd.service - Collectd statistics daemon Loaded: loaded (/usr/lib/systemd/system/collectd.service; enabled; vendor preset: disabled) Active: active (running) since Wed 2020-10-14 23:50:57 EDT; 19s ago Docs: man:collectd(1) man:collectd.conf(5) Main PID: 6809 (collectd) CGroup: /system.slice/collectd.service └─6809 /usr/sbin/collectd

Oct 14 23:50:57 mds1 systemd[1]: Starting Collectd statistics daemon... Oct 14 23:50:57 mds1 collectd[6809]: plugin_load: plugin "aggregation" successfully loaded. Oct 14 23:50:57 mds1 collectd[6809]: plugin_load: plugin "match_regex" successfully loaded. Oct 14 23:50:57 mds1 collectd[6809]: plugin_load: plugin "filedata" successfully loaded. Oct 14 23:50:57 mds1 collectd[6809]: plugin_load: plugin "syslog" successfully loaded. Oct 14 23:50:57 mds1 systemd[1]: Started Collectd statistics daemon. Oct 14 23:50:57 mds1 collectd[6809]: failed to stat /proc/fs/lustre/lod/h3lustre-MDT0000-mdtlov/filestotal Oct 14 23:50:57 mds1 collectd[6809]: unable to read file /proc/fs/lustre/lod/h3lustre-MDT0000-mdtlov/filestotal Oct 14 23:50:57 mds1 collectd[6809]: failed to stat /proc/fs/lustre/lod/h3lustre-MDT0000-mdtlov/filesfree Oct 14 23:50:57 mds1 collectd[6809]: unable to read file /proc/fs/lustre/lod/h3lustre-MDT0000-mdtlov/filesfree

不知道是哪里的配置有误,请指教,期待您的回复,谢谢

GodGirlwsy avatar Oct 15 '20 03:10 GodGirlwsy

I noticed the following errors:

[2020/10/14-14:58:40] [INFO] [esmon_install_nodeps.py:1514] can't deterimine Lustre version according to RPM names on host [mds], possible versions are [es2 es3 es4 2.7 2.12], using default [es3]

Please check your Lustre version is compatible with es3 (2.10). Otherwise, the data wo't be collected properly. If 2.10 is not the closest version with your Lustre version, please change the Lustre version in /etc/esmon_install.conf.

LiXi-storage avatar Oct 15 '20 04:10 LiXi-storage

I noticed the following errors:

[2020/10/14-14:58:40] [INFO] [esmon_install_nodeps.py:1514] can't deterimine Lustre version according to RPM names on host [mds], possible versions are [es2 es3 es4 2.7 2.12], using default [es3]

Please check your Lustre version is compatible with es3 (2.10). Otherwise, the data wo't be collected properly. If 2.10 is not the closest version with your Lustre version, please change the Lustre version in /etc/esmon_install.conf.

我安装的lustre是2.12,esmon为esmon-1.3.ge627284.x86_64.iso。

尝试一:将/etc/esmon_install.conf配置文件中的lustre version修改为es4仍然存在数据收集失败的消息: [root@mds1 ~]# systemctl status collectd.service ● collectd.service - Collectd statistics daemon Loaded: loaded (/usr/lib/systemd/system/collectd.service; enabled; vendor preset: disabled) Active: active (running) since Wed 2020-10-14 23:50:57 EDT; 19s ago Docs: man:collectd(1) man:collectd.conf(5) Main PID: 6809 (collectd) CGroup: /system.slice/collectd.service └─6809 /usr/sbin/collectd

Oct 14 23:50:57 mds1 systemd[1]: Starting Collectd statistics daemon... Oct 14 23:50:57 mds1 collectd[6809]: plugin_load: plugin "aggregation" successfully loaded. Oct 14 23:50:57 mds1 collectd[6809]: plugin_load: plugin "match_regex" successfully loaded. Oct 14 23:50:57 mds1 collectd[6809]: plugin_load: plugin "filedata" successfully loaded. Oct 14 23:50:57 mds1 collectd[6809]: plugin_load: plugin "syslog" successfully loaded. Oct 14 23:50:57 mds1 systemd[1]: Started Collectd statistics daemon. Oct 14 23:50:57 mds1 collectd[6809]: failed to stat /proc/fs/lustre/lod/h3lustre-MDT0000-mdtlov/filestotal Oct 14 23:50:57 mds1 collectd[6809]: unable to read file /proc/fs/lustre/lod/h3lustre-MDT0000-mdtlov/filestotal Oct 14 23:50:57 mds1 collectd[6809]: failed to stat /proc/fs/lustre/lod/h3lustre-MDT0000-mdtlov/filesfree Oct 14 23:50:57 mds1 collectd[6809]: unable to read file /proc/fs/lustre/lod/h3lustre-MDT0000-mdtlov/filesfree

尝试一:将/etc/esmon_install.conf配置文件中的lustre version修改为2.12: [root@cloudos111 etc]# esmon_install Started installing Exascaler monitoring system using config [/etc/esmon_install.conf], please check [/var/log/esmon_install/2020-10-15-13_59_26] for more log [2020/10/15-13:59:26] [INFO] [filelock.py:247] Lock 139624011587344 acquired on /etc/esmon_install.conf.lock [2020/10/15-13:59:26] [ERROR] [esmon_install_nodeps.py:2252] unsupported Lustre version [2.12], please correct file [/etc/esmon_install.conf] [2020/10/15-13:59:26] [ERROR] [esmon_install_nodeps.py:2428] failed to parse config [/etc/esmon_install.conf] [2020/10/15-13:59:26] [INFO] [filelock.py:310] Lock 139624011587344 released on /etc/esmon_install.conf.lock [2020/10/15-13:59:26] [ERROR] [esmon_install_nodeps.py:2674] installation failed, please check [/var/log/esmon_install/2020-10-15-13_59_26] for more log

GodGirlwsy avatar Oct 15 '20 06:10 GodGirlwsy