gopsutil icon indicating copy to clipboard operation
gopsutil copied to clipboard

Crash in createTimeWithContext when run on FreeBSD arm64

Open sdalu opened this issue 1 year ago • 18 comments

Describe the bug When running telegraph 1.31.0~553d972c, with gopsutil v3.24.4, it crash in the createTimeWithContext.

2024-05-02T13:09:40Z E! FATAL: [inputs.procstat] panicked: runtime error: invalid memory address or nil pointer dereference, Stack:
goroutine 147 [running]:
github.com/influxdata/telegraf/agent.panicRecover(0x4d410370)
	/go/src/github.com/influxdata/telegraf/agent/agent.go:1202 +0x74
panic({0x67aa400, 0xc587b20})
	/usr/local/go/src/runtime/panic.go:770 +0xfc
github.com/shirou/gopsutil/v3/process.(*Process).createTimeWithContext(0x4d0a0368, {0x8232a44, 0xc9983c0})
	/go/pkg/mod/github.com/shirou/gopsutil/[email protected]/process/process_freebsd.go:121 +0x4c
github.com/shirou/gopsutil/v3/process.(*Process).CreateTimeWithContext(0x4d0a0368, {0x8232a44, 0xc9983c0})
	/go/pkg/mod/github.com/shirou/gopsutil/[email protected]/process/process.go:310 +0x74
github.com/shirou/gopsutil/v3/process.NewProcessWithContext({0x8232a44, 0xc9983c0}, 0x3744)
	/go/pkg/mod/github.com/shirou/gopsutil/[email protected]/process/process.go:218 +0x78
github.com/shirou/gopsutil/v3/process.NewProcess(...)
	/go/pkg/mod/github.com/shirou/gopsutil/[email protected]/process/process.go:203
github.com/influxdata/telegraf/plugins/inputs/procstat.newProc(0x3744)
	/go/src/github.com/influxdata/telegraf/plugins/inputs/procstat/process.go:38 +0x30
github.com/influxdata/telegraf/plugins/inputs/procstat.(*Procstat).gatherOld(0x4ccc6e48, {0x824a858, 0x4d40cae0})
	/go/src/github.com/influxdata/telegraf/plugins/inputs/procstat/procstat.go:209 +0x848
github.com/influxdata/telegraf/plugins/inputs/procstat.(*Procstat).Gather(0x4ccc6e48, {0x824a858, 0x4d40cae0})
	/go/src/github.com/influxdata/telegraf/plugins/inputs/procstat/procstat.go:166 +0x38
github.com/influxdata/telegraf/models.(*RunningInput).Gather(0x4d410370, {0x824a858, 0x4d40cae0})
	/go/src/github.com/influxdata/telegraf/models/running_input.go:227 +0x2c4
github.com/influxdata/telegraf/agent.(*Agent).gatherOnce.func1()
	/go/src/github.com/influxdata/telegraf/agent/agent.go:583 +0x70
created by github.com/influxdata/telegraf/agent.(*Agent).gatherOnce in goroutine 120
	/go/src/github.com/influxdata/telegraf/agent/agent.go:581 +0xc0

goroutine 1 [semacquire]:
sync.runtime_Semacquire(0x4d310b68)
	/usr/local/go/src/runtime/sema.go:62 +0x3c
sync.(*WaitGroup).Wait(0
2024-05-02T13:09:40Z E! PLEASE REPORT THIS PANIC ON GITHUB with stack trace, configuration, and OS information: https://github.com/influxdata/telegraf/issues/new/choose

Issue initially reported: https://github.com/influxdata/telegraf/issues/13933

Environment (please complete the following information): FreBSD:

14.0-RELEASE-p4 14.0-RELEASE-p4 14.0-RELEASE-p4

FreeBSD brain.home.sdalu.com 14.0-RELEASE-p4 FreeBSD 14.0-RELEASE-p4 #0 releng/14.0-n265400-4edf3b80733e: Wed Jan 17 19:32:19 CET 2024 [email protected]:/usr/obj/usr/src/arm64.aarch64/sys/BROADCOM arm64

sdalu avatar May 02 '24 13:05 sdalu

I can pass go122 test -run Test_Process_CreateTime on my QEMU environment. Any other information or could you provide a minimal reproducible code?

# go122 version
go version go1.22.1 freebsd/arm64

# uname -a
FreeBSD generic 14.0-RELEASE FreeBSD 14.0-RELEASE #0 releng/14.0-n265380-f9716eee8ab4: Fri Nov 10 08:59:18 UTC 2023     [email protected]:/usr/obj/usr/src/arm64.aarch64/sys/GENERIC arm64

shirou avatar May 03 '24 13:05 shirou

I'm just testing with the artifact freebsd_armv7.tar.gz provided https://github.com/influxdata/telegraf/pull/15272 . Hoping that @powersj could tell you more about the build. The computer on which I tested is a Raspberry PI 4.

I'm not a go user, how do I run go test -run Test_Process_CreateTime ?

I tried:

git clone  https://github.com/shirou/gopsutil
cd gopsutil
go test -run Test_Process_CreateTime

#  result I got:
?   	github.com/shirou/gopsutil/v3	[no test files]

sdalu avatar May 03 '24 14:05 sdalu

@sdalu,

Can you run go test -v -run Test_Process_CreateTime ./process/...

@shirou,

In the original issue, I had the reporter use the following code. You can see his response for the output.

powersj avatar May 13 '24 17:05 powersj

root@brain:~/gopsutil # go test -v -run Test_Process_CreateTime ./process/...
go: downloading golang.org/x/sys v0.20.0
go: downloading github.com/stretchr/testify v1.9.0
=== RUN   Test_Process_CreateTime
--- PASS: Test_Process_CreateTime (0.00s)
PASS
ok  	github.com/shirou/gopsutil/v3/process	0.016s

sdalu avatar May 13 '24 17:05 sdalu

Thank you for the information. This means that, at least in the test environment, the CreateTime() is being retrieved without any issues. In other words, it is not a simple implementation mistake in the code or a problem with the struct definition. Do you have any other potential issues in mind?

shirou avatar May 14 '24 14:05 shirou

@shirou,

Well gopsutil should not panic :)

In the original issue, I had sdalu, first get a process using the scripts own PID:

myself, err := process.NewProcess(int32(currentPid))

This returned nothing, yet the test does.

@sdalu,

Can you run the following go code:

go.mod:

module test-process

go 1.22

require github.com/shirou/gopsutil/v3 v3.24.4

require (
	github.com/go-ole/go-ole v1.2.6 // indirect
	github.com/lufia/plan9stats v0.0.0-20211012122336-39d0f177ccd0 // indirect
	github.com/power-devops/perfstat v0.0.0-20210106213030-5aafc221ea8c // indirect
	github.com/shoenig/go-m1cpu v0.1.6 // indirect
	github.com/tklauser/go-sysconf v0.3.14 // indirect
	github.com/tklauser/numcpus v0.8.0 // indirect
	github.com/yusufpapurcu/wmi v1.2.4 // indirect
	golang.org/x/sys v0.20.0 // indirect
)

main.go:

package main

import (
	"fmt"
	"os"

	"github.com/shirou/gopsutil/v3/process"
)

func main() {
	currentPid := os.Getpid()
	myself, err := process.NewProcess(int32(currentPid))
	if err != nil {
		panic(err)
	}
	fmt.Println(myself.Name())
	fmt.Println(myself.String())
	fmt.Println(myself.NumThreads())
	fmt.Println(myself.RlimitUsage(true))
	fmt.Println(myself.Status())

	time, err := myself.CreateTime()
	if err != nil {
		panic(err)
	}
	fmt.Println(time)
}

And either run this directly via go run . or build it go build . and run the test-process binary.

powersj avatar May 14 '24 14:05 powersj

I'm using go 1.21

root@brain:~/t # go run .
test-process <nil>
{"pid":38018}
6 <nil>
[] not implemented yet
[running] <nil>
1715697644691

sdalu avatar May 14 '24 14:05 sdalu

Well that is not what I was expecting. Just to confirm, this was run on your FreeBSD system that was showing the issue with Telegraf?

powersj avatar May 14 '24 14:05 powersj

Yes. I re-run the telegraf artifact, just in case. It's crashing with the same panic. Should I try to build it instead of runing the artifact ?

sdalu avatar May 14 '24 14:05 sdalu

@sdalu I have put up https://github.com/influxdata/telegraf/pull/15355 which will hopefully print out what the result of getKProc call and the pid we are looking at. In 20-30mins can you download another Telegraf artifact and see what you get?

Thanks!

powersj avatar May 14 '24 15:05 powersj

@powersj Here is the output:

root@brain:~/t/telegraf-1.31.0 # ./usr/bin/telegraf --config /usr/local/etc/telegraf.conf --debug
2024-05-14T20:07:16Z I! Loading config: /usr/local/etc/telegraf.conf
2024-05-14T20:07:16Z W! DeprecationWarning: Option "fielddrop" of plugin "inputs.system" deprecated since version 1.29.0 and will be removed in 2.0.0: use 'fieldexclude' instead
2024-05-14T20:07:16Z W! DeprecationWarning: Option "dns_lookup" of plugin "inputs.ntpq" deprecated since version 1.24.0 and will be removed in 2.0.0: add '-n' to 'options' instead to skip DNS lookup
2024-05-14T20:07:16Z I! Starting Telegraf 1.31.0-74404b62 brought to you by InfluxData the makers of InfluxDB
2024-05-14T20:07:16Z I! Available plugins: 233 inputs, 9 aggregators, 32 processors, 25 parsers, 60 outputs, 4 secret-stores
2024-05-14T20:07:16Z I! Loaded inputs: apache cpu disk diskio exec (2x) internal mem mysql net netstat ntpq processes procstat (10x) redis smart swap system zfs
2024-05-14T20:07:16Z I! Loaded aggregators: 
2024-05-14T20:07:16Z I! Loaded processors: converter override (4x) regex
2024-05-14T20:07:16Z I! Loaded secretstores: 
2024-05-14T20:07:16Z I! Loaded outputs: influxdb_v2
2024-05-14T20:07:16Z I! Tags enabled: host=brain.home.sdalu.com
2024-05-14T20:07:16Z W! Deprecated inputs: 0 and 1 options
2024-05-14T20:07:16Z I! [agent] Config: Interval:20s, Quiet:false, Hostname:"brain.home.sdalu.com", Flush Interval:30s
2024-05-14T20:07:16Z D! [agent] Initializing plugins
2024-05-14T20:07:16Z W! [inputs.smart] nvme not found: verify that nvme is installed and it is in your PATH (or specified in config) to gather vendor specific attributes: provided path does not exist: []
2024-05-14T20:07:16Z W! DeprecationWarning: Value "false" for option "ignore_protocol_stats" of plugin "inputs.net" deprecated since version 1.27.3 and will be removed in 1.36.0: use the 'inputs.nstat' plugin instead
2024-05-14T20:07:16Z D! [agent] Connecting outputs
2024-05-14T20:07:16Z D! [agent] Attempting connection to [outputs.influxdb_v2]
2024-05-14T20:07:16Z D! [agent] Successfully connected to outputs.influxdb_v2
2024-05-14T20:07:16Z D! [agent] Starting service inputs
getting process with pid:  40544
k: (*process.KinfoProc)(nil)
2024-05-14T20:07:20Z E! FATAL: [inputs.procstat] panicked: runtime error: invalid memory address or nil pointer dereference, Stack:
goroutine 149 [running]:
github.com/influxdata/telegraf/agent.panicRecover(0x4d638140)
	/go/src/github.com/influxdata/telegraf/agent/agent.go:1202 +0x74
panic({0x696a390, 0xc88a150})
	/usr/local/go/src/runtime/panic.go:770 +0xfc
github.com/shirou/gopsutil/v3/process.(*Process).createTimeWithContext(0x4ce30098, {0x843a4c4, 0xcc9ccd8})
	/go/pkg/mod/github.com/powersj/gopsutil/[email protected]/process/process_freebsd.go:123 +0xac
github.com/shirou/gopsutil/v3/process.(*Process).CreateTimeWithContext(0x4ce30098, {0x843a4c4, 0xcc9ccd8})
	/go/pkg/mod/github.com/powersj/gopsutil/[email protected]/process/process.go:312 +0x74
github.com/shirou/gopsutil/v3/process.NewProcessWithContext({0x843a4c4, 0xcc9ccd8}, 0x9e60)
	/go/pkg/mod/github.com/powersj/gopsutil/[email protected]/process/process.go:220 +0x78
github.com/shirou/gopsutil/v3/process.NewProcess(0x9e60)
	/go/pkg/mod/github.com/powersj/gopsutil/[email protected]/process/process.go:205 +0x90
github.com/influxdata/telegraf/plugins/inputs/procstat.newProc(0x9e60)
	/go/src/github.com/influxdata/telegraf/plugins/inputs/procstat/process.go:38 +0x1c
github.com/influxdata/telegraf/plugins/inputs/procstat.(*Procstat).gatherOld(0x4d36cf78, {0x8452d48, 0x4d367308})
	/go/src/github.com/influxdata/telegraf/plugins/inputs/procstat/procstat.go:223 +0x810
github.com/influxdata/telegraf/plugins/inputs/procstat.(*Procstat).Gather(0x4d36cf78, {0x8452d48, 0x4d367308})
	/go/src/github.com/influxdata/telegraf/plugins/inputs/procstat/procstat.go:185 +0x38
github.com/influxdata/telegraf/models.(*RunningInput).Gather(0x4d638140, {0x8452d48, 0x4d367308})
	/go/src/github.com/influxdata/telegraf/models/running_input.go:228 +0x2c4
github.com/influxdata/telegraf/agent.(*Agent).gatherOnce.func1()
	/go/src/github.com/influxdata/telegraf/agent/agent.go:583 +0x70
created by github.com/influxdata/telegraf/agent.(*Agent).gatherOnce in goroutine 146
	/go/src/github.com/influxdata/telegraf/agent/agent.go:581 +0xc0

goroutine
2024-05-14T20:07:20Z E! PLEASE REPORT THIS PANIC ON GITHUB with stack trace, configuration, and OS information: https://github.com/influxdata/telegraf/issues/new/choose

sdalu avatar May 14 '24 20:05 sdalu

Thanks for trying that out!

getting process with pid: 40544 k: (*process.KinfoProc)(nil)

@shirou any reason why the kinfoproc would be nil? Does that mean gopsutil failed to parse the information but didn't generate an error?

That print statement is from my own fork, printing out the k value before we attempt to create time with context:

https://github.com/powersj/gopsutil/blob/919283ff5f489d3697e01c5267ccdd0a9273cd41/process/process_freebsd.go#L122

powersj avatar May 14 '24 20:05 powersj

I read the code and it doesn't seem like err is nil and kInfoProc is nil at the same time. However, this issue is occurring in practice. Therefore, I have created a PR #1653 to add a nil check. Could you please take a look?

shirou avatar May 23 '24 14:05 shirou

@sdalu,

As before, can you grab the telegraf artifacts in few minutes from this post at: https://github.com/influxdata/telegraf/pull/15397

@shirou - thanks for looking into this!

powersj avatar May 23 '24 16:05 powersj

@powersj here is the result (looks the same):

root@brain:~/telegraf-1.31.0 # ./usr/bin/telegraf --config /usr/local/etc/telegraf.conf --debug
2024-05-23T19:41:13Z I! Loading config: /usr/local/etc/telegraf.conf
2024-05-23T19:41:13Z W! DeprecationWarning: Option "fielddrop" of plugin "inputs.system" deprecated since version 1.29.0 and will be removed in 2.0.0: use 'fieldexclude' instead
2024-05-23T19:41:13Z W! DeprecationWarning: Option "dns_lookup" of plugin "inputs.ntpq" deprecated since version 1.24.0 and will be removed in 2.0.0: add '-n' to 'options' instead to skip DNS lookup
2024-05-23T19:41:13Z I! Starting Telegraf 1.31.0-ef79d2b9 brought to you by InfluxData the makers of InfluxDB
2024-05-23T19:41:13Z I! Available plugins: 233 inputs, 9 aggregators, 32 processors, 26 parsers, 60 outputs, 4 secret-stores
2024-05-23T19:41:13Z I! Loaded inputs: apache cpu disk diskio exec (2x) internal mem mysql net netstat ntpq processes procstat (10x) redis smart swap system zfs
2024-05-23T19:41:13Z I! Loaded aggregators: 
2024-05-23T19:41:13Z I! Loaded processors: converter override (4x) regex
2024-05-23T19:41:13Z I! Loaded secretstores: 
2024-05-23T19:41:13Z I! Loaded outputs: influxdb_v2
2024-05-23T19:41:13Z I! Tags enabled: host=brain.home.sdalu.com
2024-05-23T19:41:13Z W! Deprecated inputs: 0 and 1 options
2024-05-23T19:41:13Z I! [agent] Config: Interval:20s, Quiet:false, Hostname:"brain.home.sdalu.com", Flush Interval:30s
2024-05-23T19:41:13Z D! [agent] Initializing plugins
2024-05-23T19:41:13Z W! [inputs.smart] nvme not found: verify that nvme is installed and it is in your PATH (or specified in config) to gather vendor specific attributes: provided path does not exist: []
2024-05-23T19:41:13Z W! DeprecationWarning: Value "false" for option "ignore_protocol_stats" of plugin "inputs.net" deprecated since version 1.27.3 and will be removed in 1.36.0: use the 'inputs.nstat' plugin instead for protocol stats
2024-05-23T19:41:13Z D! [agent] Connecting outputs
2024-05-23T19:41:13Z D! [agent] Attempting connection to [outputs.influxdb_v2]
2024-05-23T19:41:13Z D! [agent] Successfully connected to outputs.influxdb_v2
2024-05-23T19:41:13Z D! [agent] Starting service inputs
2024-05-23T19:41:20Z E! FATAL: [inputs.procstat] panicked: runtime error: invalid memory address or nil pointer dereference, Stack:
goroutine 61 [running]:
github.com/influxdata/telegraf/agent.panicRecover(0x4d36c5a0)
	/go/src/github.com/influxdata/telegraf/agent/agent.go:1202 +0x74
panic({0x68b6d10, 0xc708b80})
	/usr/local/go/src/runtime/panic.go:770 +0xfc
github.com/shirou/gopsutil/v3/process.(*Process).createTimeWithContext(0x4d0cc128, {0x8361124, 0xcb1af28})
	/go/pkg/mod/github.com/shirou/gopsutil/[email protected]/process/process_freebsd.go:122 +0x4c
github.com/shirou/gopsutil/v3/process.(*Process).CreateTimeWithContext(0x4d0cc128, {0x8361124, 0xcb1af28})
	/go/pkg/mod/github.com/shirou/gopsutil/[email protected]/process/process.go:310 +0x74
github.com/shirou/gopsutil/v3/process.NewProcessWithContext({0x8361124, 0xcb1af28}, 0x624)
	/go/pkg/mod/github.com/shirou/gopsutil/[email protected]/process/process.go:218 +0x78
github.com/shirou/gopsutil/v3/process.NewProcess(...)
	/go/pkg/mod/github.com/shirou/gopsutil/[email protected]/process/process.go:203
github.com/influxdata/telegraf/plugins/inputs/procstat.newProc(0x624)
	/go/src/github.com/influxdata/telegraf/plugins/inputs/procstat/process.go:38 +0x30
github.com/influxdata/telegraf/plugins/inputs/procstat.(*Procstat).gatherOld(0x4d29e8f8, {0x8379320, 0x4d3951e8})
	/go/src/github.com/influxdata/telegraf/plugins/inputs/procstat/procstat.go:223 +0x810
github.com/influxdata/telegraf/plugins/inputs/procstat.(*Procstat).Gather(0x4d29e8f8, {0x8379320, 0x4d3951e8})
	/go/src/github.com/influxdata/telegraf/plugins/inputs/procstat/procstat.go:185 +0x38
github.com/influxdata/telegraf/models.(*RunningInput).Gather(0x4d36c5a0, {0x8379320, 0x4d3951e8})
	/go/src/github.com/influxdata/telegraf/models/running_input.go:228 +0x2c4
github.com/influxdata/telegraf/agent.(*Agent).gatherOnce.func1()
	/go/src/github.com/influxdata/telegraf/agent/agent.go:583 +0x70
created by github.com/influxdata/telegraf/agent.(*Agent).gatherOnce in goroutine 112
	/go/src/github.com/influxdata/telegraf/agent/agent.go:581 +0xc0

goroutine 1 [
2024-05-23T19:41:20Z E! PLEASE REPORT THIS PANIC ON GITHUB with stack trace, configuration, and OS information: https://github.com/influxdata/telegraf/issues/new/choose

I downloaded: telegraf-1.31.0~ef79d2b9_freebsd_armv7.tar.gz

sdalu avatar May 23 '24 19:05 sdalu

That's the same line, you originally saw:

return int64(k.Start.Sec)*1000 + int64(k.Start.Usec)/1000, nil

Ah, @shirou, the nil check is checking if the memory address is nil, so that will never be nil.

- &x == nil
+ x == nil

powersj avatar May 23 '24 19:05 powersj

The parseKinfoProc function returns a struct instance, not a pointer. Therefore, I agree that it should be impossible for a pointer to this value to be nil. However, it is actually happening, so I am very confused.

shirou avatar May 27 '24 14:05 shirou

@shirou - can that be changed? Otherwise you would need to check if any of the pointer fields are nil?

powersj avatar May 29 '24 15:05 powersj