ovis
ovis copied to clipboard
cray_dvs_sampler: Memory allocation failure in mount_dvs()
On NERSC's TDS, ldms reports "cray_dvs_sampler: Memory allocation failure in mount_dvs()"
NERSC nodes mount 34 DVS filesystems. The error appears if no conffile is used, or if the conffile contains more than 20-ish DVS metrics.
Poking around briefly, it looks like the error is an ENOMEM returned from dvs_sampler.c:create_metric_set
??Hi Eric,
The dvs sampler has ~512 metrics per mount point unless you scope it with a configuration file. You have probably exceeded the default memory allocation and need to use the -m flag when running ldmsd and allocate a larger amount of memory. You should allocate more memory than you will possibly use (e.g., -m 10MB), run with your samplers, and use ldms_ls -v to see how much you actually used (sum data and meta data sizes over all sets). Then modify your config files to allocate above that threshold. We will be adding additional reporting that reports how much the sets used so that you can get that number without having to sum output in this way.
Thanks,
Jim
From: eric-roman [email protected] Sent: Tuesday, March 12, 2019 1:05 PM To: ovis-hpc/ovis Cc: Subscribed Subject: [EXTERNAL] Re: [ovis-hpc/ovis] cray_dvs_sampler: Memory allocation failure in mount_dvs() (#26)
Poking around briefly, it looks like the error is an ENOMEM returned from dvs_sampler.c:create_metric_set
You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/ovis-hpc/ovis/issues/26#issuecomment-472139886, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AByo2Xw-i14u46OUMWRCIo3gBsXClFvbks5vV_qOgaJpZM4brpgZ.
BTW, the reason that memory doesn't automatically scale with need is to minimize the number of memory registration resources used by the Aries transport.