ovis
ovis copied to clipboard
Hardcoded EDAC paths in ldms-static-test.sh
It appears that max_mc and max_csrow are hardcoded at 2. It would be interesting to find a programmatic way to find a reasonable upper bound for these parameters.
https://github.com/ovis-hpc/ovis/blob/b22708c2dbdd2a7672ef773c8efdd09d14bcdab2/ldms/scripts/examples/edac.1#L2
It would also be useful to have a more informative error message show up in the sampler log. When the max_mc or max_cs is set too high for instance, the only output is:
Tue Aug 13 11:43:02 2019: ERROR : edac: failed to open file during config.
Tue Aug 13 11:43:02 2019: ERROR : edac: failed to create a metric set.
Tue Aug 13 11:43:02 2019: ERROR : msg_no 16: error 22: Plugin 'edac' configuration error.
Tue Aug 13 11:43:02 2019: ERROR : Configuration error at line 22 (/projects/opt/centos7/ovis/4.2.3/configs/sampler1.conf)
Tue Aug 13 11:43:02 2019: CRITICAL : LDMSD_ LDMS Daemon exiting...status 22, Error 22 processing configuration file '/projects/opt/centos7/ovis/4.2.3/configs/sampler1.conf'
Tue Aug 13 11:43:02 2019: CRITICAL : LDMSD_ cleanup end.
Please contribute a bash script function that will grope the file system for linux kernels version 3 and 4 to discover the reasonable local upper bound. 2x2 seemed a reasonable default because less than that is unlikely to be an hpc server where the test would be of relevance. For this script, there is also the case to note in some useful error output way of the kernel not having loaded the needed module.
"See 'man Plugin_edac' for details of configuring the sampler." would be more informative. Or a size hint about what was discovered. Please submit a patch of the output message.