telegraf icon indicating copy to clipboard operation
telegraf copied to clipboard

Thread-level monitoring (similar to top -H)

Open mmadhusudan opened this issue 6 years ago • 5 comments

Feature Request

Telegraf procstat plugin helps to monitor various processes on the system, but does not take into account the multi-threaded processes (all running with the same main pid , but different thread ids) In Linux, the "top -H" command gives a thread-level metric output.

Proposal:

Add a plugin or extension to procstat/processor plugin to enable thread-level monitoring

Current behavior:

Process-level aggregated output is displayed/stored

Desired behavior:

Thread-level output is desired

Use case: [Why is this important (helps with prioritizing requests)]

Many processes are multi-threaded and run on different cpus (or vcpus) , and the process output would be an aggregate of all .

mmadhusudan avatar Dec 31 '18 10:12 mmadhusudan

Would this be pulled into development anytime soon?

mmadhusudan avatar Jan 23 '19 06:01 mmadhusudan

I don't believe the library we are using, gopsutil, can get this information yet. Someone needs to investigate how this data can be acquired and propose a change to the gopsutil project, is this something you could do?

danielnelson avatar Jan 23 '19 19:01 danielnelson

I can do the investigation. But I am no expertise in Go :)

mmadhusudan avatar Jan 24 '19 07:01 mmadhusudan

It looks like the threads can be read in the same way to processes from /proc/$LWP. I'll bet this means we would only need to expose an option to search threads like pgrep -w influxdb and wouldn't require any changes to gopsutil.

danielnelson avatar Jan 25 '19 02:01 danielnelson

At least on Linux, the thread info you are looking for is in /proc/$PID/task/$TID. All the handling of this data is already present in gopsutil.

The question I am struggling with is, how does one present the data? Are threads a property of the process? Or is it a separate accumulator metric?

The change can be as easy as:

telegraf (master) $ git diff
diff --git a/plugins/inputs/procstat/process.go b/plugins/inputs/procstat/process.go
index 042929f..a77cedf 100644
--- a/plugins/inputs/procstat/process.go
+++ b/plugins/inputs/procstat/process.go
@@ -26,6 +26,7 @@ type Process interface {
        RlimitUsage(bool) ([]process.RlimitStat, error)
        Username() (string, error)
        CreateTime() (int64, error)
+       Threads() (map[int32]*cpu.TimesStat, error)
 }
 type PIDFinder interface {
diff --git a/plugins/inputs/procstat/procstat.go b/plugins/inputs/procstat/procstat.go
index 61e5753..0496450 100644
--- a/plugins/inputs/procstat/procstat.go
+++ b/plugins/inputs/procstat/procstat.go
@@ -297,6 +297,22 @@ func (p *Procstat) addMetric(proc Process, acc telegraf.Accumulator) {
                }
        }
+       tpids, err := proc.Threads();
+       if err == nil {
+               for tpid, cpu_time := range tpids {
+                       fields[prefix+string(tpid)+"_cpu_time_user"] = cpu_time.User
+                       fields[prefix+string(tpid)+"_cpu_time_system"] = cpu_time.System
+                       fields[prefix+string(tpid)+"_cpu_time_idle"] = cpu_time.Idle
+                       fields[prefix+string(tpid)+"_cpu_time_nice"] = cpu_time.Nice
+                       fields[prefix+string(tpid)+"_cpu_time_iowait"] = cpu_time.Iowait
+                       fields[prefix+string(tpid)+"_cpu_time_irq"] = cpu_time.Irq
+                       fields[prefix+string(tpid)+"_cpu_time_soft_irq"] = cpu_time.Softirq
+                       fields[prefix+string(tpid)+"_cpu_time_steal"] = cpu_time.Steal
+                       fields[prefix+string(tpid)+"_cpu_time_guest"] = cpu_time.Guest
+                       fields[prefix+string(tpid)+"_cpu_time_guest_nice"] = cpu_time.GuestNice
+               }
+       }
+
        acc.AddFields("procstat", fields, proc.Tags())
 }```

mkysel avatar Sep 15 '20 19:09 mkysel