Singularity icon indicating copy to clipboard operation
Singularity copied to clipboard

Memory usage detection

Open petrovicboban opened this issue 7 years ago • 6 comments

Hi, in our case, Singularity 0.18.2 can't detect memory usage for slaves:

image

but it can overall, for cluster:

image

Also, in request view, it shows 0 for memory usage:

image

but on task level, that's not the case:

image

petrovicboban avatar Feb 22 '18 12:02 petrovicboban

It may depend on the isolators you have configured. Do you have either the cgroups/mem or posix/mem isolators configured for your mesos slaves?

ssalinas avatar Mar 06 '18 14:03 ssalinas

Its posix. But how it detects cpu usage? Its posix isolator for cpu too.

petrovicboban avatar Mar 06 '18 21:03 petrovicboban

That slave memory view is based off of adding up task usages, so if tasks aren't showing it, the slaves will not show it.

Mesos is the entity collecting the actual metric values in this case, not singularity. It will collect them differently based on how each isolator is implemented.

If you hit an endpoint like {hostname}:5051/monitor/statistics on one of your mesos slaves/agents, do you see memory statistics reported? For example, with our slaves we get back a list of objects like:

{
    "executor_id": "{id}",
    "executor_name": "",
    "framework_id": "{id}",
    "source": "{task id}",
    "statistics": {
      "cpus_limit": 1.1,
      "cpus_system_time_secs": 17.9,
      "cpus_user_time_secs": 140.66,
      "mem_anon_bytes": 714723328,
      "mem_cache_bytes": 2695168,
      "mem_critical_pressure_counter": 0,
      "mem_file_bytes": 2695168,
      "mem_limit_bytes": 1314914304,
      "mem_low_pressure_counter": 0,
      "mem_mapped_file_bytes": 106496,
      "mem_medium_pressure_counter": 0,
      "mem_rss_bytes": 714723328,
      "mem_swap_bytes": 0,
      "mem_total_bytes": 741773312,
      "mem_unevictable_bytes": 0,
      "timestamp": 1521811482.55977
    }
  }

That endpoint on the mesos slave is what singularity is polling to get usage statistics. If it is not being reported there, either you are on an older mesos slave version, or your isolator does not collect those metrics. In which case the feature will not function

ssalinas avatar Mar 23 '18 13:03 ssalinas

This is what our mesos slaves return:

    {
        "executor_id": "kg45",
        "executor_name": "",
        "framework_id": "Singularity",
        "source": "test_template_test_job_2-test_job_2_deploy_19-1519399684082-1-db07-DEFAULT",
        "statistics": {
            "cpus_limit": 0.2,
            "cpus_system_time_secs": 1324.95,
            "cpus_user_time_secs": 1780.07,
            "mem_limit_bytes": 201326592,
            "mem_rss_bytes": 596295680,
            "timestamp": 1521822348.65732
        }
    }

Much less than yours, so I guess it's because of posix isolator. Mesos itself is not too old (1.1)

petrovicboban avatar Mar 23 '18 16:03 petrovicboban

Ok, I'll leave this open so we can implement a version that works with the smaller subset of metrics

ssalinas avatar Mar 23 '18 17:03 ssalinas

We run into this issue too, and workaround it by using mem_limit_bytes instead of mem_total_bytes.

Not particularly proud of the hack, but still give us useful information. you can see the change at:
https://github.com/HubSpot/Singularity/compare/master...Nitro:fix-memory-cgroup?expand=1 (open to send a PR)

felixgborrego avatar Mar 29 '18 15:03 felixgborrego