ovis icon indicating copy to clipboard operation
ovis copied to clipboard

cray_dvs_sampler: Memory allocation failure in mount_dvs()

Open eric-roman opened this issue 5 years ago • 3 comments

On NERSC's TDS, ldms reports "cray_dvs_sampler: Memory allocation failure in mount_dvs()"

NERSC nodes mount 34 DVS filesystems. The error appears if no conffile is used, or if the conffile contains more than 20-ish DVS metrics.

eric-roman avatar Mar 12 '19 18:03 eric-roman

Poking around briefly, it looks like the error is an ENOMEM returned from dvs_sampler.c:create_metric_set

eric-roman avatar Mar 12 '19 19:03 eric-roman

??Hi Eric,

The dvs sampler has ~512 metrics per mount point unless you scope it with a configuration file. You have probably exceeded the default memory allocation and need to use the -m flag when running ldmsd and allocate a larger amount of memory. You should allocate more memory than you will possibly use (e.g., -m 10MB), run with your samplers, and use ldms_ls -v to see how much you actually used (sum data and meta data sizes over all sets). Then modify your config files to allocate above that threshold. We will be adding additional reporting that reports how much the sets used so that you can get that number without having to sum output in this way.

Thanks,

Jim


From: eric-roman [email protected] Sent: Tuesday, March 12, 2019 1:05 PM To: ovis-hpc/ovis Cc: Subscribed Subject: [EXTERNAL] Re: [ovis-hpc/ovis] cray_dvs_sampler: Memory allocation failure in mount_dvs() (#26)

Poking around briefly, it looks like the error is an ENOMEM returned from dvs_sampler.c:create_metric_set

You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/ovis-hpc/ovis/issues/26#issuecomment-472139886, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AByo2Xw-i14u46OUMWRCIo3gBsXClFvbks5vV_qOgaJpZM4brpgZ.

valleydlr avatar Mar 12 '19 21:03 valleydlr

BTW, the reason that memory doesn't automatically scale with need is to minimize the number of memory registration resources used by the Aries transport.

tom95858 avatar Mar 17 '19 15:03 tom95858