alloy icon indicating copy to clipboard operation
alloy copied to clipboard

Configurable cgroup ID regex for process discovery component

Open mahendrapaipuri opened this issue 6 months ago • 2 comments

Request

Currently, process discovery component attempts to get the container ID by looking into the cgroup of the process. This can be further generalized by adding a new argument that takes a regex as input and attempts to find the cgroup ID by matching it against the cgroup of the process. Consequently we can add a new label, say, __meta_cgroup_id__, that will be added to the targets.

Use case

Most of the resource managers use cgroups to manage the resource allocated to compute workloads. In our particular case, it is SLURM (HPC batch scheduler). By using a configurable cgroup ID regex to process discovery component, we can find the job IDs of each process. And by using relabel magic, we can filter the processes that do not belong to any user jobs and eventually use the job ID as service_name to Pyroscope eBPF component. This will allow us to do continous profiling of user jobs and aggregate the profiles of each job on Grafana based on service_name (which is essentially job ID).

This should work for any resource manager which manages the cgroups in a deterministic way. For instance, one more use case can be to use with Openstack where libvirt manages the cgroups. Grafana alloy can be deployed directly on the hypervisor that will do the continous profiling of the VMs.

If the maintainers find a value in this feature, I would be happy to submit a PR.

mahendrapaipuri avatar Aug 13 '24 15:08 mahendrapaipuri