nhc icon indicating copy to clipboard operation
nhc copied to clipboard

LBNL Node Health Check

Results 83 nhc issues
Sort by recently updated
recently updated
newest added

This new feature in 1.4.3: ``` check_nvsmi_healthmon(): New check from CSC for GPU health monitoring via nvidia-smi ``` doesn't seem to be present in the release RPM file lbnl_nv.nhc. How...

bug
help wanted

I apologize for not having much experience with Pull Requests. Here are my new functions: check_all_fs_used, check_all_fs_inodes, check_all_fs_ifree, check_all_fs_iused which are used to check all filesystems of a particular "fstype"....

enhancement

I added a helper script to mark nodes for reboot. It's based on `node-mark-offline`, but executes `scontrol reboot ASAP ` instead. This helper script can be used by setting `OFFLINE_NODE`...

bug
enhancement

We're running the NHC 1.4.3 RC1 RPM lbnl-nhc-1.4.3-1.el8.noarch on ~100 AlmaLinux 8.5 systems. These servers have Cornelis (Intel) Omni-Path 100 Gbit adapters, and I check them with this rule in...

Hello! it would be nice if you could make a new release with the fixes from the last years /Sven

Consider nodes name with domain "pi.sjtu.edu.cn", such as "node838.example.edu.cn": Current version of nhc always use long hostname ``` function nhcmain_init_env() { ... if [[ -r /proc/sys/kernel/hostname ]]; then read HOSTNAME...

Hi, There seems to be an error in the script nhc/helpers/node-mark-offline. there is a missing ";;" between line 69 and 70 to properly pass from one "case" statement to the...

If nhc was configured with options like `--prefix=/opt/nhc`, then default CONFDIR, INCDIR and so on would still point to /etc/nhc. It is acceptable for some cases, but often `/etc` is...

Please, add this check. Now I check this via dmidecode, but I should to specify id and it could change on different nodes. I would be good to change this...

While the code is perfectly functional in it's current state, the function **_nhc_hw_gather_data_** can take upwards of 40-60 seconds on multithreaded KNL nodes. It would be nice to optimize this...