Memory usage detection
Hi, in our case, Singularity 0.18.2 can't detect memory usage for slaves:

but it can overall, for cluster:

Also, in request view, it shows 0 for memory usage:

but on task level, that's not the case:

It may depend on the isolators you have configured. Do you have either the cgroups/mem or posix/mem isolators configured for your mesos slaves?
Its posix. But how it detects cpu usage? Its posix isolator for cpu too.
That slave memory view is based off of adding up task usages, so if tasks aren't showing it, the slaves will not show it.
Mesos is the entity collecting the actual metric values in this case, not singularity. It will collect them differently based on how each isolator is implemented.
If you hit an endpoint like {hostname}:5051/monitor/statistics on one of your mesos slaves/agents, do you see memory statistics reported? For example, with our slaves we get back a list of objects like:
{
"executor_id": "{id}",
"executor_name": "",
"framework_id": "{id}",
"source": "{task id}",
"statistics": {
"cpus_limit": 1.1,
"cpus_system_time_secs": 17.9,
"cpus_user_time_secs": 140.66,
"mem_anon_bytes": 714723328,
"mem_cache_bytes": 2695168,
"mem_critical_pressure_counter": 0,
"mem_file_bytes": 2695168,
"mem_limit_bytes": 1314914304,
"mem_low_pressure_counter": 0,
"mem_mapped_file_bytes": 106496,
"mem_medium_pressure_counter": 0,
"mem_rss_bytes": 714723328,
"mem_swap_bytes": 0,
"mem_total_bytes": 741773312,
"mem_unevictable_bytes": 0,
"timestamp": 1521811482.55977
}
}
That endpoint on the mesos slave is what singularity is polling to get usage statistics. If it is not being reported there, either you are on an older mesos slave version, or your isolator does not collect those metrics. In which case the feature will not function
This is what our mesos slaves return:
{
"executor_id": "kg45",
"executor_name": "",
"framework_id": "Singularity",
"source": "test_template_test_job_2-test_job_2_deploy_19-1519399684082-1-db07-DEFAULT",
"statistics": {
"cpus_limit": 0.2,
"cpus_system_time_secs": 1324.95,
"cpus_user_time_secs": 1780.07,
"mem_limit_bytes": 201326592,
"mem_rss_bytes": 596295680,
"timestamp": 1521822348.65732
}
}
Much less than yours, so I guess it's because of posix isolator. Mesos itself is not too old (1.1)
Ok, I'll leave this open so we can implement a version that works with the smaller subset of metrics
We run into this issue too, and workaround it by using mem_limit_bytes instead of mem_total_bytes.
Not particularly proud of the hack, but still give us useful information. you can see the change at:
https://github.com/HubSpot/Singularity/compare/master...Nitro:fix-memory-cgroup?expand=1 (open to send a PR)