Thread-level monitoring (similar to top -H)
Feature Request
Telegraf procstat plugin helps to monitor various processes on the system, but does not take into account the multi-threaded processes (all running with the same main pid , but different thread ids) In Linux, the "top -H" command gives a thread-level metric output.
Proposal:
Add a plugin or extension to procstat/processor plugin to enable thread-level monitoring
Current behavior:
Process-level aggregated output is displayed/stored
Desired behavior:
Thread-level output is desired
Use case: [Why is this important (helps with prioritizing requests)]
Many processes are multi-threaded and run on different cpus (or vcpus) , and the process output would be an aggregate of all .
Would this be pulled into development anytime soon?
I don't believe the library we are using, gopsutil, can get this information yet. Someone needs to investigate how this data can be acquired and propose a change to the gopsutil project, is this something you could do?
I can do the investigation. But I am no expertise in Go :)
It looks like the threads can be read in the same way to processes from /proc/$LWP. I'll bet this means we would only need to expose an option to search threads like pgrep -w influxdb and wouldn't require any changes to gopsutil.
At least on Linux, the thread info you are looking for is in /proc/$PID/task/$TID. All the handling of this data is already present in gopsutil.
The question I am struggling with is, how does one present the data? Are threads a property of the process? Or is it a separate accumulator metric?
The change can be as easy as:
telegraf (master) $ git diff
diff --git a/plugins/inputs/procstat/process.go b/plugins/inputs/procstat/process.go
index 042929f..a77cedf 100644
--- a/plugins/inputs/procstat/process.go
+++ b/plugins/inputs/procstat/process.go
@@ -26,6 +26,7 @@ type Process interface {
RlimitUsage(bool) ([]process.RlimitStat, error)
Username() (string, error)
CreateTime() (int64, error)
+ Threads() (map[int32]*cpu.TimesStat, error)
}
type PIDFinder interface {
diff --git a/plugins/inputs/procstat/procstat.go b/plugins/inputs/procstat/procstat.go
index 61e5753..0496450 100644
--- a/plugins/inputs/procstat/procstat.go
+++ b/plugins/inputs/procstat/procstat.go
@@ -297,6 +297,22 @@ func (p *Procstat) addMetric(proc Process, acc telegraf.Accumulator) {
}
}
+ tpids, err := proc.Threads();
+ if err == nil {
+ for tpid, cpu_time := range tpids {
+ fields[prefix+string(tpid)+"_cpu_time_user"] = cpu_time.User
+ fields[prefix+string(tpid)+"_cpu_time_system"] = cpu_time.System
+ fields[prefix+string(tpid)+"_cpu_time_idle"] = cpu_time.Idle
+ fields[prefix+string(tpid)+"_cpu_time_nice"] = cpu_time.Nice
+ fields[prefix+string(tpid)+"_cpu_time_iowait"] = cpu_time.Iowait
+ fields[prefix+string(tpid)+"_cpu_time_irq"] = cpu_time.Irq
+ fields[prefix+string(tpid)+"_cpu_time_soft_irq"] = cpu_time.Softirq
+ fields[prefix+string(tpid)+"_cpu_time_steal"] = cpu_time.Steal
+ fields[prefix+string(tpid)+"_cpu_time_guest"] = cpu_time.Guest
+ fields[prefix+string(tpid)+"_cpu_time_guest_nice"] = cpu_time.GuestNice
+ }
+ }
+
acc.AddFields("procstat", fields, proc.Tags())
}```