logstash icon indicating copy to clipboard operation
logstash copied to clipboard

PoC to add cgroup v2 support

Open rpasche opened this issue 3 years ago • 3 comments

Release notes

[rn:skip]

What does this PR do?

Very rudimentary implementation to parse cgroup v2 information. This PR code should just be a PoC. It shows, that - at least - the CPU information of a process running in cgroup v2 can be successfully retrieved and returned by logstash (and fix issue https://github.com/elastic/logstash/issues/14534)

Why is it important/What is the impact to the user?

When running logstash on a VM or container, that is already using the cgroup v2, no metrics will be exposed by logstash and are therefore missing within Stack Monitoring

Checklist

Nothing got checked on my side

  • [ ] My code follows the style guidelines of this project
  • [ ] I have commented my code, particularly in hard-to-understand areas
  • [ ] I have made corresponding changes to the documentation
  • [ ] I have made corresponding change to the default configuration files (and/or docker env variables)
  • [ ] I have added tests that prove my fix is effective or that my feature works

Author's Checklist

How to test this PR locally

  • create a VM that uses cgroup2fs. You can verify this by running stat -fc %T /sys/fs/cgroup. When running cgroup v2, the result should be cgroup2fs. See example
logstash@in-beats-1:/opt$ stat -fc %T /sys/fs/cgroup/
cgroup2fs
logstash@in-beats-1:/opt$
  • run logstash and check the API via curl localhost:9600/_node/stats?pretty . Results should contain an os object looking somehow like this
{
  "host" : "in-beats-1",
  "version" : "8.5.0",
  "http_address" : "0.0.0.0:9600",
  "id" : "02e938c4-9a6a-4e38-8023-ca54b86d685c",
  "name" : "in-beats-1",
  "ephemeral_id" : "5343f7f9-093c-4031-aaa3-e24a6312588a",
  "status" : "green",
  "snapshot" : true,
  "pipeline" : {
    "workers" : 4,
    "batch_size" : 125,
    "batch_delay" : 50
  },
  "monitoring" : {
    "cluster_uuid" : "some_cluster_id"
  },
  "jvm" : {
    "threads" : {
      "count" : 206,
      "peak_count" : 208
    },
...
...
  "os" : {
    "cgroup" : {
      "cpuacct" : {
        "control_group" : "/",
        "usage_nanos" : 895076842
      },
      "cpu" : {
        "control_group" : "/",
        "cfs_period_micros" : 100000,
        "cfs_quota_micros" : 400000,
        "stat" : {
          "number_of_times_throttled" : 142,
          "number_of_elapsed_periods" : 58122,
          "time_throttled_nanos" : 65119181
        }
      }
    }
  },
  "queue" : {
    "events_count" : 0
  }
}

Related issues

  • Relates #14534

Use cases

Screenshots

image

And here from the running pod

"os" : {
    "cgroup" : {
      "cpu" : {
        "stat" : {
          "time_throttled_nanos" : 70506861,
          "number_of_elapsed_periods" : 63733,
          "number_of_times_throttled" : 145
        },
        "cfs_quota_micros" : 400000,
        "control_group" : "/",
        "cfs_period_micros" : 100000
      },
      "cpuacct" : {
        "control_group" : "/",
        "usage_nanos" : 908588206
      }
    }
  }

Note: Still something is missing

Still, the Stack Monitoring does not fully work. It looks like the cfs values are exported by logstash, but they are not picked up by metricbeat (the node and node_stats code of logstash module do not contain a JSON structure to use - at least - the cfs_quota_micros values returned (see also https://github.com/elastic/beats/blob/main/metricbeat/module/logstash/node_stats/data.go#L79)

image image

Logs

rpasche avatar Sep 19 '22 10:09 rpasche

Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually?

logstashmachine avatar Sep 19 '22 10:09 logstashmachine

Sorry....some missing infos:

  • Stack Monitoring (elasticsearch, kibana, metricbeat): 8.4.1 version
  • logstash: latest code of main branch + PoC code

rpasche avatar Sep 19 '22 10:09 rpasche

This screenshot was taken from the .monitoring-logstash-8-mb data-stream image

which has dynmatic set to false and no mapping for cfs fields exist image

image

rpasche avatar Sep 19 '22 14:09 rpasche