telegraf
telegraf copied to clipboard
n_users field in [inputs.system]] plugin stuck at 0 on ppc64le
Relevent telegraf.conf
[[inputs.system]]
System info
Telegraf 1.20.2, RHEL 8.4
Docker
Not Applicable
Steps to reproduce
- Install telegraf-1.20.2-1.ppc64le
- Configure with the stock inputs.system plugin
- Can see output with
telegraf --config test.conf --test
...
Expected behavior
Number of user session on the node is currently 35 (2 lines are output header):
# w | wc -l
37
Expect that telegraf would capture that for the n_users field for example:
# telegraf --config test.conf --test
2021-10-26T18:17:20Z I! Starting Telegraf 1.20.2
> system,host=XXXX.redacted.com load1=624.76,load15=625.04,load5=624.6,n_cpus=128i,n_users=35i 1635272240000000000
> system,host=XXXX.redacted.com uptime=5432092i 1635272240000000000
> system,host=XXXX.redacted.com uptime_format="62 days, 20:54" 1635272240000000000
Actual behavior
Telegraf records the number of user sessions as 0:
# telegraf --config test.conf --test
2021-10-26T18:17:20Z I! Starting Telegraf 1.20.2
> system,host=XXXX.redacted.com load1=624.76,load15=625.04,load5=624.6,n_cpus=128i,n_users=0i 1635272240000000000
> system,host=XXXX.redacted.com uptime=5432092i 1635272240000000000
> system,host=XXXX.redacted.com uptime_format="62 days, 20:54" 1635272240000000000
Additional info
No errors are recorded in the telegraf log complaining about not being able to retrieve that field.
Thanks for opening this over here.
It looks like the system input plugin pulls the number of users from the gopsutil library. It looks like the value itself comes from this function.
Would you be willing to run the following go code snippet to help narrow down if this is in the gopsutil library itself or with telegraf:
package main
import (
"fmt"
"os"
"github.com/shirou/gopsutil/host"
)
func main() {
users, err := host.Users()
if err == nil {
fmt.Println(len(users))
} else if os.IsNotExist(err) {
fmt.Println("Reading users: ", err.Error())
} else if os.IsPermission(err) {
fmt.Println(err.Error())
}
}
Thanks!
I'm not the best with go, so let me know if I'm doing something wrong here, but this is what I get:
# cat jd_test.go
package main
import (
"fmt"
"os"
"github.com/shirou/gopsutil/host"
)
func main() {
users, err := host.Users()
if err == nil {
fmt.Println(len(users))
} else if os.IsNotExist(err) {
fmt.Println("Reading users: ", err.Error())
} else if os.IsPermission(err) {
fmt.Println(err.Error())
}
}
Running code snippet:
# chmod +x jd_test.go
# go run jd_test.go
go run: cannot run *_test.go files (jd_test.go)
to make this a little easier I threw some debugging messages into Telegraf, and put up a fake PR to get it to build. Can you try downloading the ppc64el.tar.gz and running that with the --debug
option and share the output here?
Thanks!
I got the following:
# ./telegraf --config ../../etc/telegraf/test.conf --debug --test
2021-10-26T20:53:23Z I! Starting Telegraf
2021-10-26T20:53:23Z D! [agent] Initializing plugins
2021-10-26T20:53:23Z D! [agent] Starting service inputs
2021-10-26T20:53:23Z D! [inputs.system] Found %!i(int=0) number of users
2021-10-26T20:53:23Z D! [agent] Stopping service inputs
2021-10-26T20:53:23Z D! [agent] Input channel closed
2021-10-26T20:53:23Z D! [agent] Stopped Successfully
> system,host=XXXX.redacted.com load1=624.1,load15=624.25,load5=624.15,n_cpus=128i,n_users=0i 1635281603000000000
> system,host=XXXX.redacted.com uptime=5441455i 1635281603000000000
> system,host=XXXX.redacted.com uptime_format="62 days, 23:30" 1635281603000000000
Looks like the library itself is reporting 0 users. Can you open a bug in the upstream gopsutil project and see what they say? You can reference this bug as well.
Thanks!
I got the same problem. Is there a fix for it?
Have only seen this issue in raspberry pis. Modified the script from @jdmaloney into main.go
package main
import (
"fmt"
"os"
"strings"
"github.com/shirou/gopsutil/host"
)
func main() {
// Get host information
users, err := host.Users()
if err == nil {
nUsers := len(users)
nUniqueUsers := findUniqueUsers(users)
fmt.Printf("Number of users: %d\n", nUsers)
fmt.Printf("Number of unique users: %d\n", nUniqueUsers)
} else if os.IsNotExist(err) {
fmt.Println("Reading users: ", err.Error())
} else if os.IsPermission(err) {
fmt.Println("Permission error: ", err.Error())
} else {
fmt.Println("Other error: ", err.Error())
}
}
func findUniqueUsers(users []host.UserStat) int {
uniqueUsernames := make(map[string]struct{})
for _, user := range users {
// Normalize username (case-insensitive) for uniqueness
normalizedUsername := strings.ToLower(user.User)
uniqueUsernames[normalizedUsername] = struct{}{}
}
// Print detailed information about each user
fmt.Println("User details:")
for i, user := range users {
fmt.Printf("User %d: %+v\n", i+1, user)
}
return len(uniqueUsernames)
}
->go run main.go
User details:
Number of users: 0
Number of unique users: 0
More info here. https://github.com/shirou/gopsutil/issues/1129
The bug has just been fixed (https://github.com/shirou/gopsutil/issues/1129). I guess we need to wait for next release of gopsutil which could be at the beginning of April and then for updating of the dependency here in telegraf.
@JosefRypacek,
Looks like gopsutil did a release recently, I've put up https://github.com/influxdata/telegraf/pull/15082 with the updated dependency. Could you give the artifacts in that PR a try and let me know if it resolves the issue?
Thanks!