telegraf icon indicating copy to clipboard operation
telegraf copied to clipboard

n_users field in [inputs.system]] plugin stuck at 0 on ppc64le

Open jdmaloney opened this issue 3 years ago • 8 comments

Relevent telegraf.conf

[[inputs.system]]

System info

Telegraf 1.20.2, RHEL 8.4

Docker

Not Applicable

Steps to reproduce

  1. Install telegraf-1.20.2-1.ppc64le
  2. Configure with the stock inputs.system plugin
  3. Can see output with telegraf --config test.conf --test ...

Expected behavior

Number of user session on the node is currently 35 (2 lines are output header):

# w | wc -l
37

Expect that telegraf would capture that for the n_users field for example:

# telegraf --config test.conf --test
2021-10-26T18:17:20Z I! Starting Telegraf 1.20.2
> system,host=XXXX.redacted.com load1=624.76,load15=625.04,load5=624.6,n_cpus=128i,n_users=35i 1635272240000000000
> system,host=XXXX.redacted.com uptime=5432092i 1635272240000000000
> system,host=XXXX.redacted.com uptime_format="62 days, 20:54" 1635272240000000000

Actual behavior

Telegraf records the number of user sessions as 0:

# telegraf --config test.conf --test
2021-10-26T18:17:20Z I! Starting Telegraf 1.20.2
> system,host=XXXX.redacted.com load1=624.76,load15=625.04,load5=624.6,n_cpus=128i,n_users=0i 1635272240000000000
> system,host=XXXX.redacted.com uptime=5432092i 1635272240000000000
> system,host=XXXX.redacted.com uptime_format="62 days, 20:54" 1635272240000000000

Additional info

No errors are recorded in the telegraf log complaining about not being able to retrieve that field.

jdmaloney avatar Oct 26 '21 18:10 jdmaloney

Thanks for opening this over here.

It looks like the system input plugin pulls the number of users from the gopsutil library. It looks like the value itself comes from this function.

Would you be willing to run the following go code snippet to help narrow down if this is in the gopsutil library itself or with telegraf:

package main

import (
	"fmt"
	"os"

	"github.com/shirou/gopsutil/host"
)

func main() {
	users, err := host.Users()
	if err == nil {
		fmt.Println(len(users))
	} else if os.IsNotExist(err) {
		fmt.Println("Reading users: ", err.Error())
	} else if os.IsPermission(err) {
		fmt.Println(err.Error())
	}
}

Thanks!

powersj avatar Oct 26 '21 18:10 powersj

I'm not the best with go, so let me know if I'm doing something wrong here, but this is what I get:

# cat jd_test.go
package main

import (
	"fmt"
	"os"

	"github.com/shirou/gopsutil/host"
)

func main() {
	users, err := host.Users()
	if err == nil {
		fmt.Println(len(users))
	} else if os.IsNotExist(err) {
		fmt.Println("Reading users: ", err.Error())
	} else if os.IsPermission(err) {
		fmt.Println(err.Error())
	}
}

Running code snippet:

# chmod +x jd_test.go
# go run jd_test.go
go run: cannot run *_test.go files (jd_test.go)

jdmaloney avatar Oct 26 '21 18:10 jdmaloney

to make this a little easier I threw some debugging messages into Telegraf, and put up a fake PR to get it to build. Can you try downloading the ppc64el.tar.gz and running that with the --debug option and share the output here?

Thanks!

powersj avatar Oct 26 '21 19:10 powersj

I got the following:

# ./telegraf --config ../../etc/telegraf/test.conf --debug --test
2021-10-26T20:53:23Z I! Starting Telegraf
2021-10-26T20:53:23Z D! [agent] Initializing plugins
2021-10-26T20:53:23Z D! [agent] Starting service inputs
2021-10-26T20:53:23Z D! [inputs.system] Found %!i(int=0) number of users
2021-10-26T20:53:23Z D! [agent] Stopping service inputs
2021-10-26T20:53:23Z D! [agent] Input channel closed
2021-10-26T20:53:23Z D! [agent] Stopped Successfully
> system,host=XXXX.redacted.com load1=624.1,load15=624.25,load5=624.15,n_cpus=128i,n_users=0i 1635281603000000000
> system,host=XXXX.redacted.com uptime=5441455i 1635281603000000000
> system,host=XXXX.redacted.com uptime_format="62 days, 23:30" 1635281603000000000

jdmaloney avatar Oct 26 '21 20:10 jdmaloney

Looks like the library itself is reporting 0 users. Can you open a bug in the upstream gopsutil project and see what they say? You can reference this bug as well.

Thanks!

powersj avatar Oct 26 '21 21:10 powersj

I got the same problem. Is there a fix for it?

radu-boboc avatar May 21 '22 13:05 radu-boboc

Have only seen this issue in raspberry pis. Modified the script from @jdmaloney into main.go

package main

import (
        "fmt"
        "os"
        "strings"

        "github.com/shirou/gopsutil/host"
)

func main() {
        // Get host information
        users, err := host.Users()
        if err == nil {
                nUsers := len(users)
                nUniqueUsers := findUniqueUsers(users)

                fmt.Printf("Number of users: %d\n", nUsers)
                fmt.Printf("Number of unique users: %d\n", nUniqueUsers)
        } else if os.IsNotExist(err) {
                fmt.Println("Reading users: ", err.Error())
        } else if os.IsPermission(err) {
                fmt.Println("Permission error: ", err.Error())
        } else {
                fmt.Println("Other error: ", err.Error())
        }
}

func findUniqueUsers(users []host.UserStat) int {
        uniqueUsernames := make(map[string]struct{})

        for _, user := range users {
                // Normalize username (case-insensitive) for uniqueness
                normalizedUsername := strings.ToLower(user.User)
                uniqueUsernames[normalizedUsername] = struct{}{}
        }

        // Print detailed information about each user
        fmt.Println("User details:")
        for i, user := range users {
                fmt.Printf("User %d: %+v\n", i+1, user)
        }

        return len(uniqueUsernames)
}
->go run main.go
User details:
Number of users: 0
Number of unique users: 0

misterf13 avatar Dec 08 '23 15:12 misterf13

More info here. https://github.com/shirou/gopsutil/issues/1129

misterf13 avatar Dec 08 '23 16:12 misterf13

The bug has just been fixed (https://github.com/shirou/gopsutil/issues/1129). I guess we need to wait for next release of gopsutil which could be at the beginning of April and then for updating of the dependency here in telegraf.

JosefRypacek avatar Mar 04 '24 18:03 JosefRypacek

@JosefRypacek,

Looks like gopsutil did a release recently, I've put up https://github.com/influxdata/telegraf/pull/15082 with the updated dependency. Could you give the artifacts in that PR a try and let me know if it resolves the issue?

Thanks!

powersj avatar Apr 01 '24 20:04 powersj