gopsutil icon indicating copy to clipboard operation
gopsutil copied to clipboard

Occasional crash when using sensors on MacOS arm64

Open PierreF opened this issue 8 months ago • 5 comments

Describe the bug

I got some random crash when calling SensorsTemperatures() on MacOS (24.3.0 on arm64).

The error seems to be some kind of race-condition, since it only occur when multiple threads are calling SensorsTemperatures concurrently.

On real usage, I think my true issue is with concurrent call between SensorsTemperatures and another usage of ioKit and/or corefoundation. But I was not able to have reproducible code sample that only code one call per system (sensors, disk, cpu, mem...)

To Reproduce

package main

import (
        "log/slog"
	"sync"

	"github.com/shirou/gopsutil/v4/sensors"
)

func main() {
	var wg sync.WaitGroup

	for range 30 { // The higher is this number, the more likely issue will occur. Empirically 30 seems a good value
		wg.Add(1)

		go func() {
			defer wg.Done()
			r, err := sensors.SensorsTemperatures()

			if false {
				// The log itself isn't required to produce the bug, but without
				// assigning SensorsTemperatures result to variable the bug don't seems to
				// occure, maybe due to compiler optimization ?
				slog.Info("sensors", slog.Any("r", r), slog.Any("err", err))
			}
		}()
	}

	wg.Wait()
}

Run the program (possibly multiple time, the race condition seems rather unlikely):

go build sensors_bug.go

while ./sensors_bug ; do echo "Sucess"; done 2>&1 | tee large_error_message.log

It result in error like:

unexpected fault address 0x100921808
fatal error: fault
[signal SIGBUS: bus error code=0x1 addr=0x100921808 pc=0x10092181c]

goroutine 39 gp=0x14000106c40 m=28 mp=0x140000ee008 [running]:
runtime.throw({0x100923b7e?, 0x0?})
	/opt/homebrew/Cellar/go/1.24.2/libexec/src/runtime/panic.go:1101 +0x38 fp=0x14000297a70 sp=0x14000297a40 pc=0x1008d0fe8
runtime.sigpanic()
	/opt/homebrew/Cellar/go/1.24.2/libexec/src/runtime/signal_unix.go:922 +0x170 fp=0x14000297ad0 sp=0x14000297a70 pc=0x1008d2800
github.com/shirou/gopsutil/v4/internal/common.NewLibrary({0x0, 0x0})
	/Users/pierref/go/pkg/mod/github.com/shirou/gopsutil/[email protected]/internal/common/common_darwin.go:97 +0x9c fp=0x14000297b20 sp=0x14000297ae0 pc=0x10092181c
github.com/shirou/gopsutil/v4/sensors.TemperaturesWithContext({0x0?, 0x0?})
	/Users/pierref/go/pkg/mod/github.com/shirou/gopsutil/[email protected]/sensors/sensors_darwin_arm64.go:54 +0x6d4 fp=0x14000297fc0 sp=0x14000297b20 pc=0x100922144
created by main.main in goroutine 1
	/Users/pierref/tmp/20250403-1426/sensors_bug.go:16 +0x38

goroutine 1 gp=0x140000021c0 m=nil [sync.WaitGroup.Wait]:
runtime.gopark(0x100a29680?, 0x1008d1310?, 0x0?, 0x40?, 0x100da7f28?)
	/opt/homebrew/Cellar/go/1.24.2/libexec/src/runtime/proc.go:435 +0xc8 fp=0x1400006de50 sp=0x1400006de30 pc=0x1008d10c8
runtime.goparkunlock(...)
	/opt/homebrew/Cellar/go/1.24.2/libexec/src/runtime/proc.go:441
runtime.semacquire1(0x140001140b8, 0x0, 0x1, 0x0, 0x18)
	/opt/homebrew/Cellar/go/1.24.2/libexec/src/runtime/sema.go:188 +0x204 fp=0x1400006dea0 sp=0x1400006de50 pc=0x1008b4604
sync.runtime_SemacquireWaitGroup(0x140000021c0?)
	/opt/homebrew/Cellar/go/1.24.2/libexec/src/runtime/sema.go:110 +0x2c fp=0x1400006dee0 sp=0x1400006dea0 pc=0x1008d24ac
sync.(*WaitGroup).Wait(0x140001140b0)
[... truncated since I don't belive it matter for this bug]

Expected behavior

No crash :)

Environment (please complete the following information):

  • [x] Mac OS: [paste the result of sw_vers and uname -a
$ sw_vers
ProductName:            macOS
ProductVersion:         15.3.2
BuildVersion:           24D81
$ uname -a
Darwin mbp-de-pierre.bleemeo.work 24.3.0 Darwin Kernel Version 24.3.0: Thu Jan  2 20:24:16 PST 2025; root:xnu-11215.81.4~3/RELEASE_ARM64_T6000 arm64 arm Darwin

gopsutil version:

$ cat go.mod 
module test

go 1.24.2

require github.com/shirou/gopsutil/v4 v4.25.3

require (
        github.com/ebitengine/purego v0.8.2 // indirect
        github.com/go-ole/go-ole v1.2.6 // indirect
        github.com/yusufpapurcu/wmi v1.2.4 // indirect
        golang.org/x/sys v0.28.0 // indirect
)

Additional context

I think the bug is due to ioKit and/or corefoundation library being closed by another gorouting while still being used by the one who crash.

To experiment with this, I've modified TemperaturesWithContext (go mod vendor then edit "vendor/github.com/shirou/gopsutil/v4/sensors/sensors_darwin_arm64.go).

The idea is to make TemperaturesWithContext doing concurrent call (like the minimal step to reproduce it), but this time the ioKit and coreFoundation library are shared between gorouting.

func TemperaturesWithContext(_ context.Context) ([]TemperatureStat, error) {
	var wg sync.WaitGroup

	var (
		globalResult []TemperatureStat
		globalErr    error
		l            sync.Mutex
	)

	ioKit, err := common.NewLibrary(common.IOKit)
	if err != nil {
		return nil, err
	}
	defer ioKit.Close()

	coreFoundation, err := common.NewLibrary(common.CoreFoundation)
	if err != nil {
		return nil, err
	}
	defer coreFoundation.Close()

	for range 30 { // Once more, the higher the most likely to produce the bug
		wg.Add(1)

		go func() {
			defer wg.Done()

			r, err := temperaturesWithContext(ioKit, coreFoundation)

			l.Lock()
			defer l.Unlock()
			globalResult = r
			globalErr = err
		}()
	}

	wg.Wait()

	return globalResult, globalErr
}

func temperaturesWithContext(ioKit *common.Library, coreFoundation *common.Library) ([]TemperatureStat, error) {
	ta := &temperatureArm{
		ioKit:                              ioKit,
		cf:                                 coreFoundation,
[... the remaining of the original TemperaturesWithContext unmodified]

With this change, calling TemperaturesWithContext no longer crash:

$ cat single_call.go 
package main

import (
        "log/slog"

        "github.com/shirou/gopsutil/v4/sensors"
)

func main() {
        r, err := sensors.SensorsTemperatures()
        slog.Info("sensors", slog.Any("r", r), slog.Any("err", err))
}

$ go build single_call.go; while ./single_call ; do echo "Sucess"; done 2>&1 | tee large_error_message.log

If you move ioKit & coreFoundation inside the go func() { } (i.e. initialize and close) the libraries per gorouting, it will crash.

Very final note: only Sensors seems affected by this bug (maybe because sensor does the more complex usage of the ioKit/CF libraries ?): the following code don't exhibit the crash even if it use ioKit/CF concurrently on cpu/disk/mem: https://gist.github.com/PierreF/dd5864811ef6de22bfcb431810fe4f4f

PierreF avatar Apr 03 '25 12:04 PierreF

The reproduce condition seems to be a bit extreme, I ran the code you provided 102 times (as suggested by the length of large_error_message.log) before it crashed. In that case it means an unreasonable amount of IOKit / Core Foundation calls, and sure the sensor package is more complex, so it might be easier for it to reach system limits.

uubulb avatar Apr 07 '25 05:04 uubulb

In my test, I usually get it in less than 10 tries :/ It probably means that the race condition isn't linked only to calling sensors concurrently, and might even depends on something running elsewhere... (another process on the system ? I also think to other gorouting / GC ?).

If I can found some time, I'll try to come with more realistic way to reproduce it. In real usage I don't call sensors concurrently (only concurrently with disk/cpu/mem) and I do hit the bug "fast" (like in few hundreds call to sensors - i.e. 1 hours with one call to sensors every 10 seconds).

PierreF avatar Apr 07 '25 08:04 PierreF

On my environment, no panic occurred on your first code after more than 500 "Success". gopsutil version is v4.25.3.

go version go1.24.3 darwin/arm64

ProductName:            macOS
ProductVersion:         15.4.1
BuildVersion:           24E263

Darwin mypc 24.4.0 Darwin Kernel Version 24.4.0: Fri Apr 11 18:33:46 PDT 2025; root:xnu-11417.101.15~117/RELEASE_ARM64_T8112 arm64

shirou avatar May 19 '25 13:05 shirou

My Mac Studio M1 Ultra w/ 128GB ram and macOS 15.4.1 ran once then seg faulted on the second run. Not 10, not 102, not 500. Two. I've been running into a lot of seg faults that all seem to originate from this package so I started digging more and found this issue. I do not think this is the bug I'm running into, but it's definitely similar. Attaching the log file from the crash (using the code in the OP).

large_error_message.log

EDIT: To test I just made a new folder named i inside an existing project (Notifiarr) with a go.mod that imports this package. You'll see that in the output file, but nothing from Notifiarr was used here.

davidnewhall avatar May 30 '25 15:05 davidnewhall

Turns out this is the problem my app has been running into. I've been testing exclusively on macOS, and every time the app calls sensors it's a crapshoot on the outcome. Sometimes it's fine. Sometimes I get a full segmentation violation. I could attach that stack trace, but there's nothing in it specific to this package. I was only able to narrow it down by removing calls to sensors and watching the problem go away.

EDIT: I should also point out that I generally stick the sensors output into json.Marshal. The more common problem was the marshaller throwing errors about trying to stick strings into structs, or calling IsNil on non-nullable types. These errors only happen when there's a race condition and the data the marshaller is reading is also being written at the same time. tl;dr: this is almost certainly a data race.

davidnewhall avatar Jun 01 '25 18:06 davidnewhall