nhc
nhc copied to clipboard
Optimize nhc_hw_gather_data for gathering CPU information on KNL nodes
While the code is perfectly functional in it's current state, the function nhc_hw_gather_data can take upwards of 40-60 seconds on multithreaded KNL nodes. It would be nice to optimize this portion of the check so that we aren't forced to wait 10's of seconds to grab CPU information.
If it matters we are using lbnl-nhc-1.4.2-1, and have tried using the -f flag, which doesn't appear to help.
[root@sknl0705 ~]# /usr/sbin/nhc -f ERROR: nhc: Health check failed: Script timed out while executing "check_hw_cpuinfo 1 68 272".
After thinking about this a bit more, maybe a whole new set of checks could be created instead specifically targeting KNL that would also check for cluster, memory mode, etc.
With respect to the original issue, I'm definitely open to suggestions on how to optimize nhc_hw_gather_data()
, either on Knight's Landing or just in general. I'm not sure what's taking so long, or why, though...so we'd need to figure that out first. Running nhc -x
and watching the progress of the bash
trace mode is a good way to see where it's spending all its time, if you want to give that a try. I realize /proc/cpuinfo
is quite lengthy on the KNLs, but I still have a hard time understanding why it would take that long!
And for what it's worth, the -f
option doesn't mean the checks run in parallel (though that's a feature that's on my NHC roadmap!) but rather that they are initially backgrounded and then wait
ed on by NHC; this alters the way signal delivery is handled within bash
but is not an optimization or parallelization technique. :-)
As far as the KNL platform goes, we at LANL are investigating the possibility of using NHC on our Cray systems, including Trinity, which is over half KNLs. This will likely afford me the opportunity to add KNL-specific functionality. In the meantime, however, contributions are certainly welcome as always!
Fairly certain its the piece of code that grabs information about all the processors as I mentioned in the issue title, we broke that portion of the code out and it was routinely taking 40 seconds for a KNL node with 272 virtual cores.
Here is the output of nhc -x on one of the nodes:
[root@sknl0701 ~]# nhc -x
+ dbg 'BASH tracing active.'
+ local PREFIX=
+ [[ '' == \1 ]]
+ getopts :D:ac:de:fhl:n:qt:vx OPTION
+ shift 1
+ [[ ! -z '' ]]
+ return 0
+ nhcmain_load_sysconfig
+ [[ -f /etc/sysconfig/nhc ]]
+ nhcmain_finalize_env
+ CONFDIR=/etc/nhc
+ CONFFILE=/etc/nhc/nhc.conf
+ INCDIR=/etc/nhc/scripts
+ HELPERDIR=/usr/libexec/nhc
+ ONLINE_NODE=/usr/libexec/nhc/node-mark-online
+ OFFLINE_NODE=/usr/libexec/nhc/node-mark-offline
+ LOGFILE='>>/var/log/nhc.log 2>&1'
+ RESULTFILE=/var/run/nhc/nhc.status
+ DEBUG=0
+ TS=0
+ SILENT=0
+ VERBOSE=0
+ MARK_OFFLINE=1
+ DETACHED_MODE=0
+ DETACHED_MODE_FAIL_NODATA=0
+ TIMEOUT=30
+ NHC_CHECK_ALL=0
+ NHC_CHECK_FORKED=0
+ export NHC_SID=0
+ NHC_SID=0
+ kill -s 0 -- -269885
+ [[ 0 -eq 0 ]]
+ dbg 'NHC process 269885 is session leader.'
+ local PREFIX=
+ [[ 0 == \1 ]]
+ NHC_SID=-269885
+ [[ >>/var/log/nhc.log 2>&1 != \>\>\/\v\a\r\/\l\o\g\/\n\h\c\.\l\o\g\ \2\>\&\1 ]]
+ [[ >>/var/log/nhc.log 2>&1 == \- ]]
+ [[ -z '' ]]
+ nhcmain_find_rm
+ local DIR
+ local -a DIRLIST
+ [[ -d /var/spool/torque ]]
+ [[ -n '' ]]
+ type -a -p -f -P pbsnodes
+ type -a -p -f -P scontrol
+ NHC_RM=slurm
+ return 0
+ [[ slurm == \s\g\e ]]
+ [[ 0 -ne 0 ]]
+ [[ -n '' ]]
+ [[ 0 -eq 1 ]]
+ export NAME CONFDIR CONFFILE INCDIR HELPERDIR ONLINE_NODE OFFLINE_NODE LOGFILE DEBUG TS SILENT TIMEOUT NHC_RM
+ [[ -n '' ]]
+ nhcmain_redirect_output
+ [[ -n >>/var/log/nhc.log 2>&1 ]]
+ exec
ERROR: nhc: Health check failed: Script timed out while executing "check_hw_cpuinfo 1 68 272".
+++ [[ 0 == \1 ]]
+++ exit 142
I still have a hard time understanding why it would take that long!
After reading more carefully through the code I started to wonder if this behavior is consistent with the amount of time it takes to collect cpuinfo on a normal CPU and it turns out the check is quite quick!
2x Intel(R) Xeon(R) CPU E7-4830 v3 @ 2.10GHz
[root@smem0101 ~]# ./cpu_count.sh
01:42:34 Finished Proc 0
...
01:42:34 Finished Proc 27
01:42:35 Finished Proc 28
...
01:42:35 Finished Proc 47
real 0m0.467s
user 0m0.118s
sys 0m0.353s
Intel(R) Xeon Phi(TM) CPU 7250 @ 1.40GHz
[root@sknl0701 ~]# ./cpu_count.sh
01:44:42 Finished Proc 0
...
01:44:42 Finished Proc 21
01:44:43 Finished Proc 22
...
01:44:43 Finished Proc 40
01:44:44 Finished Proc 41
...
01:44:44 Finished Proc 55
01:44:45 Finished Proc 56
...
01:44:45 Finished Proc 68
01:44:46 Finished Proc 69
...
01:44:46 Finished Proc 78
01:44:47 Finished Proc 79
...
01:44:47 Finished Proc 88
01:44:48 Finished Proc 89
...
01:44:48 Finished Proc 97
01:44:49 Finished Proc 98
...
01:44:49 Finished Proc 105
01:44:50 Finished Proc 106
...
01:44:50 Finished Proc 113
01:44:51 Finished Proc 114
...
01:44:51 Finished Proc 120
01:44:52 Finished Proc 121
...
01:44:52 Finished Proc 127
01:44:53 Finished Proc 128
...
01:44:53 Finished Proc 134
01:44:54 Finished Proc 135
...
01:44:54 Finished Proc 140
01:44:55 Finished Proc 141
...
01:44:55 Finished Proc 147
01:44:56 Finished Proc 148
...
01:44:56 Finished Proc 152
01:44:57 Finished Proc 153
...
01:44:57 Finished Proc 158
01:44:58 Finished Proc 159
...
01:44:58 Finished Proc 164
01:44:59 Finished Proc 165
01:44:59 Finished Proc 166
01:44:59 Finished Proc 167
01:44:59 Finished Proc 168
01:44:59 Finished Proc 169
01:45:00 Finished Proc 170
01:45:00 Finished Proc 171
01:45:00 Finished Proc 172
01:45:00 Finished Proc 173
01:45:00 Finished Proc 174
01:45:01 Finished Proc 175
01:45:01 Finished Proc 176
01:45:01 Finished Proc 177
01:45:01 Finished Proc 178
01:45:01 Finished Proc 179
01:45:02 Finished Proc 180
01:45:02 Finished Proc 181
01:45:02 Finished Proc 182
01:45:02 Finished Proc 183
01:45:02 Finished Proc 184
01:45:03 Finished Proc 185
01:45:03 Finished Proc 186
01:45:03 Finished Proc 187
01:45:03 Finished Proc 188
01:45:03 Finished Proc 189
01:45:04 Finished Proc 190
01:45:04 Finished Proc 191
01:45:04 Finished Proc 192
01:45:04 Finished Proc 193
01:45:04 Finished Proc 194
01:45:05 Finished Proc 195
01:45:05 Finished Proc 196
01:45:05 Finished Proc 197
01:45:05 Finished Proc 198
01:45:06 Finished Proc 199
01:45:06 Finished Proc 200
01:45:06 Finished Proc 201
01:45:06 Finished Proc 202
01:45:06 Finished Proc 203
01:45:07 Finished Proc 204
01:45:07 Finished Proc 205
01:45:07 Finished Proc 206
01:45:07 Finished Proc 207
01:45:08 Finished Proc 208
01:45:08 Finished Proc 209
01:45:08 Finished Proc 210
01:45:08 Finished Proc 211
01:45:08 Finished Proc 212
01:45:09 Finished Proc 213
01:45:09 Finished Proc 214
01:45:09 Finished Proc 215
01:45:09 Finished Proc 216
01:45:10 Finished Proc 217
01:45:10 Finished Proc 218
01:45:10 Finished Proc 219
01:45:10 Finished Proc 220
01:45:10 Finished Proc 221
01:45:11 Finished Proc 222
01:45:11 Finished Proc 223
01:45:11 Finished Proc 224
01:45:11 Finished Proc 225
01:45:12 Finished Proc 226
01:45:12 Finished Proc 227
01:45:12 Finished Proc 228
01:45:12 Finished Proc 229
01:45:13 Finished Proc 230
01:45:13 Finished Proc 231
01:45:13 Finished Proc 232
01:45:13 Finished Proc 233
01:45:14 Finished Proc 234
01:45:14 Finished Proc 235
01:45:14 Finished Proc 236
01:45:14 Finished Proc 237
01:45:15 Finished Proc 238
01:45:15 Finished Proc 239
01:45:15 Finished Proc 240
01:45:15 Finished Proc 241
01:45:16 Finished Proc 242
01:45:16 Finished Proc 243
01:45:16 Finished Proc 244
01:45:16 Finished Proc 245
01:45:17 Finished Proc 246
01:45:17 Finished Proc 247
01:45:17 Finished Proc 248
01:45:17 Finished Proc 249
01:45:18 Finished Proc 250
01:45:18 Finished Proc 251
01:45:18 Finished Proc 252
01:45:18 Finished Proc 253
01:45:19 Finished Proc 254
01:45:19 Finished Proc 255
01:45:19 Finished Proc 256
01:45:19 Finished Proc 257
01:45:20 Finished Proc 258
01:45:20 Finished Proc 259
01:45:20 Finished Proc 260
01:45:21 Finished Proc 261
01:45:21 Finished Proc 262
01:45:21 Finished Proc 263
01:45:21 Finished Proc 264
01:45:22 Finished Proc 265
01:45:22 Finished Proc 266
01:45:22 Finished Proc 267
01:45:23 Finished Proc 268
01:45:23 Finished Proc 269
01:45:23 Finished Proc 270
01:45:23 Finished Proc 271
real 0m41.821s
user 0m3.350s
sys 0m37.407s
So it appears to work normally for the first ~60 processors before performance takes a drop.
Here is the code I pulled out and sprinkled with timestamp steps.
#!/bin/sh
function nhc_hw_gather_data() {
local LINE CORES SIBLINGS MHZ PROCESSOR PHYS_ID PORT INDEX DEV
local -a FIELD PHYS_IDS
# Gather CPU info
PROCESSOR=-1
if [[ -e /proc/cpuinfo ]]; then
while read -a FIELD ; do
if [[ "${FIELD[0]} ${FIELD[1]}" = "processor :" ]]; then
: $((PROCESSOR++))
elif [[ "${FIELD[0]} ${FIELD[1]} ${FIELD[2]}" = "physical id :" ]]; then
PHYS_ID=${FIELD[3]}
if [[ -z "${PHYS_IDS[$PHYS_ID]}" ]]; then
PHYS_IDS[$PHYS_ID]=1
: $((HW_SOCKETS++))
fi
elif [[ "${FIELD[0]} ${FIELD[1]}" = "siblings :" ]]; then
SIBLINGS=${FIELD[2]}
elif [[ "${FIELD[0]} ${FIELD[1]} ${FIELD[2]}" = "cpu cores :" ]]; then
CORES=${FIELD[3]}
elif [[ "${FIELD[0]} ${FIELD[1]} ${FIELD[2]}" = "cpu MHz :" ]]; then
MHZ="${FIELD[3]}"
MHZ="${MHZ/%.*}"
echo "$(/bin/date "+%T") Finished Proc $PROCESSOR"
fi
done < /proc/cpuinfo
fi
}
time nhc_hw_gather_data
the solution we are going with currently at our site is we are dumping the first and last cpu info from /proc/cpuinfo
in a file and then reading that in so that the if/while block only has to go through 52 lines of cpu information.
A cleaner solution might involve just querying lscpu (though this adds an external dependency 👎 )
On the 64-core KNL, the /proc/cpuinfo file is about 257K, which should only take about 4 seconds to go through if it were a regular file [tried this with a copy of cpuinfo in /tmp]. I think accessing the /proc file system directly with that pattern is very corner-case and not optimized.
A couple [bash] replacements for the "done < /proc/cpuinfo" line:
done < <(cat /proc/cpuinfo)
takes about 4 seconds, and
done < <(/usr/bin/grep "processor\|physical id\|siblings\|cpu cores\|cpu MHz" /proc/cpuinfo)
takes about 1.5 seconds.
I suspect the speedup is from only reading from /proc once.
Edit: added pull request #47
Hi,
I just ran into this on an SGI UV300 with 720 threads, patch I am using is below (sorry, not a git person). This drops an nhc run from over 2 minutes to around 17 seconds for us.
--- lbnl_hw.nhc.orig 2017-10-29 10:06:07.753153308 -0700
+++ lbnl_hw.nhc 2017-10-29 10:08:56.430120437 -0700
@@ -29,6 +29,7 @@
# Gather CPU info
PROCESSOR=-1
+ CPUINFO=$(</proc/cpuinfo)
if [[ -e /proc/cpuinfo ]]; then
while read -a FIELD ; do
if [[ "${FIELD[0]} ${FIELD[1]}" = "processor :" ]]; then
@@ -47,7 +48,7 @@
MHZ="${FIELD[3]}"
MHZ="${MHZ/%.*}"
fi
- done < /proc/cpuinfo
+ done < <(echo "$CPUINFO")
fi
if [[ $PROCESSOR -ge 0 && $HW_SOCKETS -eq 0 ]]; then
HW_SOCKETS=$((PROCESSOR+1))
I see this on moderately busy dual socket EPYC nodes with 48 Cores/socket, no hyper threading. I can demonstrate it with time while read -a LINE; do : ; done </proc/cpuinfo real 0m44.246s user 0m0.429s sys 0m41.745s
compared to time while read -a LINE; do : ; done < <(/usr/bin/cat /proc/cpuinfo) real 0m1.374s user 0m0.474s sys 0m0.949s
There was a long ago thread about /proc/cpuinfo being slow to read line by line taking a lot of kernel time, which "sys" in the two cases above seems to confirm. This is on RHEL 7/CentOS 7.
Based on testing and feedback, #121 has addressed this issue sufficiently to warrant its closure; however, if your own testing or deployment experience(s) differ, please do reopen this one, or a new one, at your discretion! 😃