go
go copied to clipboard
runtime: SIGTRAMP while profiling arm64 hits new "traceback stuck" assert
What version of Go are you using (go version
)?
$ go version go version go1.18.5 linux/arm64
Does this issue reproduce with the latest release?
Have not tested with go1.19 yet.
What operating system and processor architecture are you using (go env
)?
go env
Output
$ go env GO111MODULE="" GOARCH="arm64" GOBIN="" GOCACHE="/home/lizf/.cache/go-build" GOENV="/home/lizf/.config/go/env" GOEXE="" GOEXPERIMENT="" GOFLAGS="" GOHOSTARCH="arm64" GOHOSTOS="linux" GOINSECURE="" GOMODCACHE="/home/lizf/go/pkg/mod" GONOPROXY="" GONOSUMDB="" GOOS="linux" GOPATH="/home/lizf/go" GOPRIVATE="" GOPROXY="https://proxy.golang.org,direct" GOROOT="/usr/local/go" GOSUMDB="sum.golang.org" GOTMPDIR="" GOTOOLDIR="/usr/local/go/pkg/tool/linux_arm64" GOVCS="" GOVERSION="go1.18.5" GCCGO="gccgo" AR="ar" CC="gcc" CXX="g++" CGO_ENABLED="1" GOMOD="/dev/null" GOWORK="" CGO_CFLAGS="-g -O2" CGO_CPPFLAGS="" CGO_CXXFLAGS="-g -O2" CGO_FFLAGS="-g -O2" CGO_LDFLAGS="-g -O2" PKG_CONFIG="pkg-config" GOGCCFLAGS="-fPIC -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build3053713255=/tmp/go-build -gno-record-gcc-switches"
What did you do?
Run continuous profiling via scraping /debug/pprof endpoints including mutex and blocking profiles, while performing a workload consuming from Kafka using Sarama and Klauspost Zstd libraries.
runtime.SetBlockProfileRate(100000)
runtime.SetMutexProfileFraction(100)
What did you expect to see?
No crashes.
What did you see instead?
The following crashes implicating unexpected return pc for runtime.sigtramp
and traceback stuck
(assert added in https://go-review.googlesource.com/c/go/+/400575/ by @cherrymui / #52116 which was backported to go1.18.5):
runtime: traceback stuck. pc=0x12eb5 sp=0x4019365840
stack: frame={sp:0x4019365840, fp:0x4019365840} stack=[0x4019364000,0x4019366000)
0x0000004019365740: 0x0000000000ba9494 <github.com/Shopify/sarama.releaseCrc32Field+0x0000000000000034> 0x00000040193657b8
0x0000004019365750: 0x0000000000bcd8e0 <github.com/Shopify/sarama.(*realDecoder).pop+0x0000000000000060> 0x00000040193657c8
0x0000004019365760: 0x0000000000000000 0x0000000000000000
0x0000004019365770: 0x0000000000000700 0x0000000001068260
0x0000004019365780: 0x00000000010ab360 0x0000000000005701
0x0000004019365790: 0x0000000000000000 0x0000000000000000
0x00000040193657a0: 0x0000000000000000 0x0000004000601c00
0x00000040193657b0: 0x000000401c3c7100 0x00000040193657f8
0x00000040193657c0: 0x0000000000bd070c <github.com/Shopify/sarama.(*RecordBatch).decode+0x000000000000037c> 0x000000400ff4e0f0
0x00000040193657d0: 0x0000000000bd064c <github.com/Shopify/sarama.(*RecordBatch).decode+0x00000000000002bc> 0x0000004026c7805f
0x00000040193657e0: 0x0000000000107de8 <strings.HasSuffix+0x0000000000000058> 0x0000000000000000
0x00000040193657f0: 0x0000000000000000 0x00000040193658c8
0x0000004019365800: 0x0000000000bd0eac <github.com/Shopify/sarama.(*Records).decode+0x000000000000009c> 0x0000004017c948c0
0x0000004019365810: 0x0000004026d6cf92 0x0000000000012eb5 <internal/bytealg.IndexByteString+0x0000000000000005>
0x0000004019365820: 0x0000000000012eb5 <internal/bytealg.IndexByteString+0x0000000000000005> 0x0000004019365898
0x0000004019365830: 0x00012ee60001ce60 0x0000000000012eb5 <internal/bytealg.IndexByteString+0x0000000000000005>
0x0000004019365840: >0x0000000000012eb5 <internal/bytealg.IndexByteString+0x0000000000000005> 0x00000000000000d7
0x0000004019365850: 0x0000000000000000 0x0000004019365800
0x0000004019365860: 0x0000000000bd0474 <github.com/Shopify/sarama.(*RecordBatch).decode+0x00000000000000e4> 0x00000040193658b8
0x0000004019365870: 0x0000000000000000 0x0000000000000000
0x0000004019365880: 0x0000000000000000 0x0000000000000000
0x0000004019365890: 0x0000000000000000 0x0000004026d6cf92
0x00000040193658a0: 0x000000400ff4e0f0 0x0000000000000000
0x00000040193658b0: 0x0000000000000000 0x0000000000bd09c0 <github.com/Shopify/sarama.(*RecordBatch).decode.func1+0x0000000000000000>
0x00000040193658c0: 0x000000400ff4e0f0 0x0000004019365918
0x00000040193658d0: 0x0000000000bb6f1c <github.com/Shopify/sarama.(*FetchResponseBlock).decode+0x000000000000033c> 0x0000004012612c80
0x00000040193658e0: 0x000000000154bfa8 0x0000004017c948c0
0x00000040193658f0: 0x0000000000bb6f0c <github.com/Shopify/sarama.(*FetchResponseBlock).decode+0x000000000000032c> 0x0000000000000018
0x0000004019365900: 0x00000000010d87c0 0x0000004000048301
0x0000004019365910: 0x0000004017c94a40 0x00000040193659c8
0x0000004019365920: 0x0000000000bb78e8 <github.com/Shopify/sarama.(*FetchResponse).decode+0x00000000000002a8> 0x0000004009b9fc38
0x0000004019365930: 0x000000000154bfa8 0x0000004017c948c0
fatal error: traceback stuck
goroutine 0 [idle]:
runtime: unexpected return pc for runtime.sigtramp called from 0xffff985dc5c0
stack: frame={sp:0x400eacacd0, fp:0x400eacada0} stack=[0x400eac4000,0x400eacc000)
0x000000400eacabd0: 0x000000400eacac28 0x02c595e603246a9e
0x000000400eacabe0: 0x0000000002c595e5 0x000000400eaba000
0x000000400eacabf0: 0x0000000000000000 0x000000400eac9fe0
0x000000400eacac00: 0x0000004013a80000 0x000000400eacada0
0x000000400eacac10: 0x000000400eacae20 0x000000400eacac98
0x000000400eacac20: 0x000000000007d58c <runtime.sigtrampgo+0x000000000000001c> 0x000000400000001b
0x000000400eacac30: 0x0000004000601c00 0x000000400eacac58
0x000000400eacac40: 0x0000004000603040 0x02bcec7502c540fb
0x000000400eacac50: 0x00246aa103246aa0 0x0000000000000000
0x000000400eacac60: 0x0000000000000000 0x0000000000000000
0x000000400eacac70: 0x0000000000000000 0x0000000000000000
0x000000400eacac80: 0x0000004000603040 0x000000400eacada0
0x000000400eacac90: 0x000000400eacae20 0x000000400eacacc8
0x000000400eacaca0: 0x000000000007cea4 <runtime.sigtramp+0x0000000000000084> 0x000000000000001b
0x000000400eacacb0: 0x000000400eacada0 0x000000400eacae20
0x000000400eacacc0: 0x02c5956e03246a9e 0x000000400eacbff0
0x000000400eacacd0: <0x0000ffff985dc5c0 0x000000400000001b
0x000000400eacace0: 0x000000400eacada0 0x000000400eacae20
0x000000400eacacf0: 0x000000000005abba <runtime.selparkcommit+0x000000000000000a> 0x1265726f74537070
0x000000400eacad00: 0x021210eed9a50809 0x623824220118000a
0x000000400eacad10: 0x302d333733663636 0x0000000000000015
0x000000400eacad20: 0x0000000000000071 0x000000400b753e98
0x000000400eacad30: 0x000000000153a0c0 0x0000004000603040
0x000000400eacad40: 0x000000400eacacc8 0x3e66376972bea4d0
0x000000400eacad50: 0xbebbbd41c5d26bf1 0x3f11566aaf25de2c
0x000000400eacad60: 0xbf66c16c16bebd93 0x3fc5555555555555
0x000000400eacad70: 0x3fcde3175e2d734a 0x3ffc439d143a5197
0x000000400eacad80: 0xbfd1a21a8e7d2cb5 0x0000000000000000
0x000000400eacad90: 0x0404040202020303 0x0000000000000000
0x000000400eacada0: >0x000000000000001b 0x00000000fffffffe
0x000000400eacadb0: 0x0000000000000466 0x0000000000000000
0x000000400eacadc0: 0x0000000000000000 0x0000000000000000
0x000000400eacadd0: 0x0000000000000000 0x0000000000000000
0x000000400eacade0: 0x0000000000000000 0x0000000000000000
0x000000400eacadf0: 0x0000000000000000 0x0000000000000000
0x000000400eacae00: 0x0000000000000000 0x0000000000000000
0x000000400eacae10: 0x0000000000000000 0x0000000000000000
0x000000400eacae20: 0x0000000000000000 0x0000000000000000
0x000000400eacae30: 0x000000400eac4000 0x0000004000000000
0x000000400eacae40: 0x0000000000008000 0x0000000000000000
0x000000400eacae50: 0x0000000000000000 0x0000000000000000
0x000000400eacae60: 0x0000000000000000 0x0000000000000000
0x000000400eacae70: 0x00002a1c49103b0c 0x02bcec6f02b9baa6
0x000000400eacae80: 0x02b9baaf03246a9f 0x02c5956e02bcec75
0x000000400eacae90: 0x0000000000000000 0x0000004013a805d0
runtime.throw({0x1262952?, 0x1fae6c0?})
/usr/local/go/src/runtime/panic.go:992 +0x50
runtime.gentraceback(0x203ad80?, 0x400eac9a90?, 0x400eaca968?, 0x4000603040, 0x0, 0x400eaca968, 0x40, 0x0, 0x0?, 0x6)
/usr/local/go/src/runtime/traceback.go:488 +0xd98
runtime.sigprof(0x79650, 0x400eacac08?, 0x4013a80010?, 0x4000603040, 0x4000601c00)
/usr/local/go/src/runtime/proc.go:4507 +0xf0
runtime.sighandler(0x1b?, 0x4000601c00?, 0x400eacac58?, 0x4000603040?)
/usr/local/go/src/runtime/signal_unix.go:613 +0x514
runtime.sigtrampgo(0x1b, 0x400eacada0, 0x400eacae20)
/usr/local/go/src/runtime/signal_unix.go:477 +0x164
runtime.sigtrampgo(0x1b, 0x400eacada0, 0x400eacae20)
<autogenerated>:1 +0x1c
runtime: unexpected return pc for runtime.sigtramp called from 0xffff985dc5c0
stack: frame={sp:0x400eacacd0, fp:0x400eacada0} stack=[0x400eac4000,0x400eacc000)
0x000000400eacabd0: 0x000000400eacac28 0x02c595e603246a9e
0x000000400eacabe0: 0x0000000002c595e5 0x000000400eaba000
0x000000400eacabf0: 0x0000000000000000 0x000000400eac9fe0
0x000000400eacac00: 0x0000004013a80000 0x000000400eacada0
0x000000400eacac10: 0x000000400eacae20 0x000000400eacac98
0x000000400eacac20: 0x000000000007d58c <runtime.sigtrampgo+0x000000000000001c> 0x000000400000001b
0x000000400eacac30: 0x0000004000601c00 0x000000400eacac58
0x000000400eacac40: 0x0000004000603040 0x02bcec7502c540fb
0x000000400eacac50: 0x00246aa103246aa0 0x0000000000000000
0x000000400eacac60: 0x0000000000000000 0x0000000000000000
0x000000400eacac70: 0x0000000000000000 0x0000000000000000
0x000000400eacac80: 0x0000004000603040 0x000000400eacada0
0x000000400eacac90: 0x000000400eacae20 0x000000400eacacc8
0x000000400eacaca0: 0x000000000007cea4 <runtime.sigtramp+0x0000000000000084> 0x000000000000001b
0x000000400eacacb0: 0x000000400eacada0 0x000000400eacae20
0x000000400eacacc0: 0x02c5956e03246a9e 0x000000400eacbff0
0x000000400eacacd0: <0x0000ffff985dc5c0 0x000000400000001b
0x000000400eacace0: 0x000000400eacada0 0x000000400eacae20
0x000000400eacacf0: 0x000000000005abba <runtime.selparkcommit+0x000000000000000a> 0x1265726f74537070
0x000000400eacad00: 0x021210eed9a50809 0x623824220118000a
0x000000400eacad10: 0x302d333733663636 0x0000000000000015
0x000000400eacad20: 0x0000000000000071 0x000000400b753e98
0x000000400eacad30: 0x000000000153a0c0 0x0000004000603040
0x000000400eacad40: 0x000000400eacacc8 0x3e66376972bea4d0
0x000000400eacad50: 0xbebbbd41c5d26bf1 0x3f11566aaf25de2c
0x000000400eacad60: 0xbf66c16c16bebd93 0x3fc5555555555555
0x000000400eacad70: 0x3fcde3175e2d734a 0x3ffc439d143a5197
0x000000400eacad80: 0xbfd1a21a8e7d2cb5 0x0000000000000000
0x000000400eacad90: 0x0404040202020303 0x0000000000000000
0x000000400eacada0: >0x000000000000001b 0x00000000fffffffe
0x000000400eacadb0: 0x0000000000000466 0x0000000000000000
0x000000400eacadc0: 0x0000000000000000 0x0000000000000000
0x000000400eacadd0: 0x0000000000000000 0x0000000000000000
0x000000400eacade0: 0x0000000000000000 0x0000000000000000
0x000000400eacadf0: 0x0000000000000000 0x0000000000000000
0x000000400eacae00: 0x0000000000000000 0x0000000000000000
0x000000400eacae10: 0x0000000000000000 0x0000000000000000
0x000000400eacae20: 0x0000000000000000 0x0000000000000000
0x000000400eacae30: 0x000000400eac4000 0x0000004000000000
0x000000400eacae40: 0x0000000000008000 0x0000000000000000
0x000000400eacae50: 0x0000000000000000 0x0000000000000000
0x000000400eacae60: 0x0000000000000000 0x0000000000000000
0x000000400eacae70: 0x00002a1c49103b0c 0x02bcec6f02b9baa6
0x000000400eacae80: 0x02b9baaf03246a9f 0x02c5956e02bcec75
0x000000400eacae90: 0x0000000000000000 0x0000004013a805d0
runtime.sigtramp()
/usr/local/go/src/runtime/sys_linux_arm64.s:489 +0x84
goroutine 89 [running]:
runtime: traceback stuck. pc=0x3ff93 sp=0x40048952c0
stack: frame={sp:0x40048952c0, fp:0x40048952c0} stack=[0x4004894000,0x4004896000)
0x00000040048951c0: 0x0000000000000040 0x00000040048951e8
0x00000040048951d0: 0x00000000005019a4 <github.com/klauspost/compress/zstd.(*sequenceDec).init+0x0000000000000054> 0x0000004004895310
0x00000040048951e0: 0x00000040008c6000 0x0000004004895228
0x00000040048951f0: 0x0000000000501b9c <github.com/klauspost/compress/zstd.(*sequenceDecs).initialize+0x000000000000016c> 0x0000004004895338
0x0000004004895200: 0x00000000004e2b84 <github.com/klauspost/compress/zstd.(*blockDec).decodeCompressed+0x0000000000000164> 0x0000004000530058
0x0000004004895210: 0x00000040002e22a0 0x00000000006c0001 <text/template/parse.(*endNode).Copy+0x0000000000000001>
0x0000004004895220: 0x0000000f00800005 0x000001fd00280903
0x0000004004895230: 0x00000000004e2ebc <github.com/klauspost/compress/zstd.(*blockDec).prepareSequences+0x000000000000025c> 0x000000000003ff93 <runtime.goroutineProfileWithLabels.func4.1+0x0000000000000003>
0x0000004004895240: 0x00000000000ee1a4 <os.UserHomeDir+0x00000000000000c4> 0x0000000000000811
0x0000004004895250: 0x0000000000000015 0x0000000000020000 <runtime.advanceEvacuationMark+0x0000000000000000>
0x0000004004895260: 0x00000000006c0001 <text/template/parse.(*endNode).Copy+0x0000000000000001> 0x0000000000000000
0x0000004004895270: 0x0000000000000006 0x0000000f00800005
0x0000004004895280: 0x000001fd00280903 0x0000000000000000
0x0000004004895290: 0x000000400720ea00 0x0000004004895338
0x00000040048952a0: 0x00000000004e2a78 <github.com/klauspost/compress/zstd.(*blockDec).decodeCompressed+0x0000000000000058> 0x0000004000530058
0x00000040048952b0: 0x00000040002e22a0 0x00000000000ee1a4 <os.UserHomeDir+0x00000000000000c4>
0x00000040048952c0: >0x000000000003ff93 <runtime.goroutineProfileWithLabels.func4.1+0x0000000000000003> 0x0000000000020000 <runtime.advanceEvacuationMark+0x0000000000000000>
0x00000040048952d0: 0x0000004009f42000 0x00000040002e22a0
0x00000040048952e0: 0x0000000000001a81 0x0000004009f42000
0x00000040048952f0: 0x0000004005b17500 0x000000400720ea00
0x0000004004895300: 0x0000004004a28a00 0x00000040002e22a0
0x0000004004895310: 0x0000004003bcf000 0x000000400885150c
0x0000004004895320: 0x000000000000467b 0x000000000002ba01 <runtime.gcStart+0x0000000000000701>
0x0000004004895330: 0x000000000000002e 0x0000004004895398
0x0000004004895340: 0x00000000004e1974 <github.com/klauspost/compress/zstd.(*blockDec).decodeBuf+0x0000000000000284> 0x0000004000530058
0x0000004004895350: 0x0000000000000000 0x0000000000000000
0x0000004004895360: 0x0000000000000000 0x0000004000530050
0x0000004004895370: 0x0000000000000000 0x0000000000000000
0x0000004004895380: 0x0000000000000000 0x0000000000020000 <runtime.advanceEvacuationMark+0x0000000000000000>
0x0000004004895390: 0x0000000000000000 0x0000004004895458
0x00000040048953a0: 0x00000000004fd2cc <github.com/klauspost/compress/zstd.(*frameDec).runDecoder+0x000000000000016c> 0x000000400058c000
0x00000040048953b0: 0x0000004000530050 0x0000000000000000
fatal error: traceback stuck
goroutine 0 [idle]:
runtime: unexpected return pc for runtime.sigtramp called from 0xffffabfd25c0
stack: frame={sp:0x400000acd0, fp:0x400000ada0} stack=[0x4000004000,0x400000c000)
0x000000400000abd0: 0x000000400000ac28 0x0000000000000000
0x000000400000abe0: 0x0000000000000000 0x0000004000002000
0x000000400000abf0: 0x0000000000000000 0x0000000000000000
0x000000400000ac00: 0x0000000000000000 0x000000400000ada0
0x000000400000ac10: 0x000000400000ae20 0x000000400000ac98
0x000000400000ac20: 0x000000000007d58c <runtime.sigtrampgo+0x000000000000001c> 0x000000400000001b
0x000000400000ac30: 0x0000000002046680 0x000000400000ac58
0x000000400000ac40: 0x0000004000521d40 0x0000000000000000
0x000000400000ac50: 0x0000000000000000 0x0000000000000000
0x000000400000ac60: 0x0000000000000000 0x0000000000000000
0x000000400000ac70: 0x0000000000000000 0x0000000000000000
0x000000400000ac80: 0x0000004000521d40 0x000000400000ada0
0x000000400000ac90: 0x000000400000ae20 0x000000400000acc8
0x000000400000aca0: 0x000000000007cea4 <runtime.sigtramp+0x0000000000000084> 0x000000000000001b
0x000000400000acb0: 0x000000400000ada0 0x000000400000ae20
0x000000400000acc0: 0x0000000000000000 0x000000400000bff0
0x000000400000acd0: <0x0000ffffabfd25c0 0x000000000000001b
0x000000400000ace0: 0x000000400000ada0 0x000000400000ae20
0x000000400000acf0: 0x0000000000000009 0x0000000000000015
0x000000400000ad00: 0x0000000000000000 0x00000040008c9fab
0x000000400000ad10: 0x000000000001ff93 <runtime.evacuate+0x00000000000004b3> 0x0000000000000009
0x000000400000ad20: 0x000000000000032d 0x00000040048953d0
0x000000400000ad30: 0x0000000002078100 0x0000004000521d40
0x000000400000ad40: 0x000000400000acc8 0x3e66376972bea4d0
0x000000400000ad50: 0xbebbbd41c5d26bf1 0x3f11566aaf25de2c
0x000000400000ad60: 0xbf66c16c16bebd93 0x3fc5555555555555
0x000000400000ad70: 0x3fb8fcd2e18d7b62 0x3ffe7032d1e7284a
0x000000400000ad80: 0xbfbab6204490a1e0 0x0000000000000000
0x000000400000ad90: 0x0000000000000000 0x0000000000000000
0x000000400000ada0: >0x000000000000001b 0x00000000fffffffe
0x000000400000adb0: 0x0000000000004d16 0x0000000000000000
0x000000400000adc0: 0x0000000000000000 0x0000000000000000
0x000000400000add0: 0x0000000000000000 0x0000000000000000
0x000000400000ade0: 0x0000000000000000 0x0000000000000000
0x000000400000adf0: 0x0000000000000000 0x0000000000000000
0x000000400000ae00: 0x0000000000000000 0x0000000000000000
0x000000400000ae10: 0x0000000000000000 0x0000000000000000
0x000000400000ae20: 0x0000000000000000 0x0000000000000000
0x000000400000ae30: 0x0000004000004000 0x0000000000000000
0x000000400000ae40: 0x0000000000008000 0x0000000000000000
0x000000400000ae50: 0x0000000000000000 0x0000000000000000
0x000000400000ae60: 0x0000000000000000 0x0000000000000000
0x000000400000ae70: 0x0000000000000000 0x0000000000000000
0x000000400000ae80: 0x0000000000000000 0x0000000000000000
0x000000400000ae90: 0x0000000000000000 0x0000000000000000
runtime.throw({0x1275238?, 0x1fbe760?})
/usr/local/go/src/runtime/panic.go:992 +0x50
runtime.gentraceback(0x204afc0?, 0x0?, 0x400000a968?, 0x4000521d40, 0x0, 0x400000a968, 0x40, 0x0, 0x0?, 0x6)
/usr/local/go/src/runtime/traceback.go:488 +0xd98
runtime.sigprof(0x79650, 0x400000ac08?, 0x0?, 0x4000521d40, 0x2046680)
/usr/local/go/src/runtime/proc.go:4507 +0xf0
runtime.sighandler(0x1b?, 0x2046680?, 0x400000ac58?, 0x4000521d40?)
/usr/local/go/src/runtime/signal_unix.go:613 +0x514
runtime.sigtrampgo(0x1b, 0x400000ada0, 0x400000ae20)
/usr/local/go/src/runtime/signal_unix.go:477 +0x164
runtime.sigtrampgo(0x1b, 0x400000ada0, 0x400000ae20)
<autogenerated>:1 +0x1c
runtime: unexpected return pc for runtime.sigtramp called from 0xffffabfd25c0
stack: frame={sp:0x400000acd0, fp:0x400000ada0} stack=[0x4000004000,0x400000c000)
0x000000400000abd0: 0x000000400000ac28 0x0000000000000000
0x000000400000abe0: 0x0000000000000000 0x0000004000002000
0x000000400000abf0: 0x0000000000000000 0x0000000000000000
0x000000400000ac00: 0x0000000000000000 0x000000400000ada0
0x000000400000ac10: 0x000000400000ae20 0x000000400000ac98
0x000000400000ac20: 0x000000000007d58c <runtime.sigtrampgo+0x000000000000001c> 0x000000400000001b
0x000000400000ac30: 0x0000000002046680 0x000000400000ac58
0x000000400000ac40: 0x0000004000521d40 0x0000000000000000
0x000000400000ac50: 0x0000000000000000 0x0000000000000000
0x000000400000ac60: 0x0000000000000000 0x0000000000000000
0x000000400000ac70: 0x0000000000000000 0x0000000000000000
0x000000400000ac80: 0x0000004000521d40 0x000000400000ada0
0x000000400000ac90: 0x000000400000ae20 0x000000400000acc8
0x000000400000aca0: 0x000000000007cea4 <runtime.sigtramp+0x0000000000000084> 0x000000000000001b
0x000000400000acb0: 0x000000400000ada0 0x000000400000ae20
0x000000400000acc0: 0x0000000000000000 0x000000400000bff0
0x000000400000acd0: <0x0000ffffabfd25c0 0x000000000000001b
0x000000400000ace0: 0x000000400000ada0 0x000000400000ae20
0x000000400000acf0: 0x0000000000000009 0x0000000000000015
0x000000400000ad00: 0x0000000000000000 0x00000040008c9fab
0x000000400000ad10: 0x000000000001ff93 <runtime.evacuate+0x00000000000004b3> 0x0000000000000009
0x000000400000ad20: 0x000000000000032d 0x00000040048953d0
0x000000400000ad30: 0x0000000002078100 0x0000004000521d40
0x000000400000ad40: 0x000000400000acc8 0x3e66376972bea4d0
0x000000400000ad50: 0xbebbbd41c5d26bf1 0x3f11566aaf25de2c
0x000000400000ad60: 0xbf66c16c16bebd93 0x3fc5555555555555
0x000000400000ad70: 0x3fb8fcd2e18d7b62 0x3ffe7032d1e7284a
0x000000400000ad80: 0xbfbab6204490a1e0 0x0000000000000000
0x000000400000ad90: 0x0000000000000000 0x0000000000000000
0x000000400000ada0: >0x000000000000001b 0x00000000fffffffe
0x000000400000adb0: 0x0000000000004d16 0x0000000000000000
0x000000400000adc0: 0x0000000000000000 0x0000000000000000
0x000000400000add0: 0x0000000000000000 0x0000000000000000
0x000000400000ade0: 0x0000000000000000 0x0000000000000000
0x000000400000adf0: 0x0000000000000000 0x0000000000000000
0x000000400000ae00: 0x0000000000000000 0x0000000000000000
0x000000400000ae10: 0x0000000000000000 0x0000000000000000
0x000000400000ae20: 0x0000000000000000 0x0000000000000000
0x000000400000ae30: 0x0000004000004000 0x0000000000000000
0x000000400000ae40: 0x0000000000008000 0x0000000000000000
0x000000400000ae50: 0x0000000000000000 0x0000000000000000
0x000000400000ae60: 0x0000000000000000 0x0000000000000000
0x000000400000ae70: 0x0000000000000000 0x0000000000000000
0x000000400000ae80: 0x0000000000000000 0x0000000000000000
0x000000400000ae90: 0x0000000000000000 0x0000000000000000
runtime.sigtramp()
/usr/local/go/src/runtime/sys_linux_arm64.s:489 +0x84
goroutine 197 [running]:
Given that you commented on #52116, I assume that this application was also suffering from that bug before this (seemingly incomplete) fix?
This stack looks weird. 0x12eb5 is not even an aligned PC (ARM64 instructions are always 4 bytes). Why would it appear on the stack? 0x3ff93 from another stack trace is similarly weird.
Maybe they are not PCs? Given that a nearby stack word has content 0x00012ee60001ce60, maybe 0x12eb5, 0x12ee6, 0x1ce60 are not PCs but some (32-bit?) integers on stack?
On both stack traces the SIGPROF lands on PC 0x79650. What is that PC? Is it a Go function? Thanks.
Given that you commented on #52116, I assume that this application was also suffering from that bug before this (seemingly incomplete) fix?
~It was. But we backported the fix and it was working fine. The new change for us is turning on not just CPU/mem profiling, but block profiling too.~
This stack looks weird. 0x12eb5 is not even an aligned PC (ARM64 instructions are always 4 bytes). Why would it appear on the stack? 0x3ff93 from another stack trace is similarly weird.
Maybe they are not PCs? Given that a nearby stack word has content 0x00012ee60001ce60, maybe 0x12eb5, 0x12ee6, 0x1ce60 are not PCs but some (32-bit?) integers on stack?
On both stack traces the SIGPROF lands on PC 0x79650. What is that PC? Is it a Go function? Thanks.
$ go tool addr2line retriever
0x79650
runtime.morestack_noctxt
/usr/local/go/src/runtime/asm_arm64.s:321
Given that you commented on #52116, I assume that this application was also suffering from that bug before this (seemingly incomplete) fix?
It was. But we backported the fix and it was working fine. The new change for us is turning on not just CPU/mem profiling, but block profiling too.
Addendum: I just trawled errors, and indeed, we do have significant numbers of "traceback stuck" dating back multiple months, just in applications that were able to restart after crashing without coming to my attention.
Here are some additional crashes from other applications:
$ go tool addr2line barbet
0x79410
0x79414
runtime.morestack_noctxt
/usr/local/go/src/runtime/asm_arm64.s:321
runtime.morestack_noctxt
/usr/local/go/src/runtime/asm_arm64.s:322
$ go tool addr2line beagle
0x79410
runtime.morestack_noctxt
/usr/local/go/src/runtime/asm_arm64.s:321
$ go tool addr2line shepherd
0x790a0
0x791d0
0x791d4
runtime.systemstack
/usr/local/go/src/runtime/asm_arm64.s:204
runtime.morestack_noctxt
/usr/local/go/src/runtime/asm_arm64.s:321
runtime.morestack_noctxt
/usr/local/go/src/runtime/asm_arm64.s:322
https://github.com/golang/go/blob/go1.18.5/src/runtime/asm_arm64.s#L321 -- culprit lines here.
similarity to https://github.com/golang/go/issues/52829, does this need to not have NOFRAME ie https://go-review.googlesource.com/c/go/+/405482/?
Also repros with go1.19.0
runtime: traceback stuck. pc=0x58e9e0 sp=0x4035cfada0
stack: frame={sp:0x4035cfada0, fp:0x4035cfada0} stack=[0x4035cf8000,0x4035cfc000)
0x0000004035cfaca0: 0x0000004047098bd0 0x000000010001aae8
0x0000004035cfacb0: 0xc90000000001bfb0 0x0000ffff94c3f878
0x0000004035cfacc0: 0xc9c4a878b22454ee 0x0000000000000006
0x0000004035cfacd0: 0x000000000058db7c <github.com/honeycombio/libhoney-go.(*fieldHolder).AddField+0x000000000000004c> 0x000000403cd55520
0x0000004035cface0: 0x000000000058dbf0 <github.com/honeycombio/libhoney-go.(*fieldHolder).AddField+0x00000000000000c0> 0x0000004035cfad38
0x0000004035cfacf0: 0x00000040470acb20 0x0000004035cfad58
0x0000004035cfad00: 0x000000000058e980 <github.com/honeycombio/libhoney-go.(*Event).AddField+0x0000000000000110> 0x0000000000fcae00
0x0000004035cfad10: 0x0000004047098bd0 0x00000040469ff6c8
0x0000004035cfad20: 0x0000000000000014 0x0100000000000014
0x0000004035cfad30: 0x00000040470b2248 0x0000000000ebb3a0
0x0000004035cfad40: 0x000000000058dc50 <github.com/honeycombio/libhoney-go.(*fieldHolder).AddField.func1+0x0000000000000000> 0x00000040470b2248
0x0000004035cfad50: 0x0000004035cfad40 0x0000004035cfadb8
0x0000004035cfad60: 0x00000000005a5898 <github.com/honeycombio/beeline-go/trace.(*Span).AddField+0x0000000000000158> 0x00000040470b2240
0x0000004035cfad70: 0x00000040469ff6c8 0x0000000000000014
0x0000004035cfad80: 0x0000000000ebb3a0 0x0000000001ed2ac8
0x0000004035cfad90: 0x01000000005a5840 0x0000000000f9a880
0x0000004035cfada0: >0x000000000058e9e0 <github.com/honeycombio/libhoney-go.(*Event).AddField.func1+0x0000000000000000> 0x00000040470b226c
0x0000004035cfadb0: 0x0000004035cfada0 0x0000004035cfae28
0x0000004035cfadc0: 0x00000000005a7a4c <github.com/honeycombio/beeline-go.AddField+0x00000000000000cc> 0x0000ffff94c3f878
0x0000004035cfadd0: 0x00000040469ff6c8 0x0000000001ed2ac8
0x0000004035cfade0: 0x0000000000ebb3a0 0x0000000001ed2ac8
0x0000004035cfadf0: 0x0000004035cfae08 0x0000004035cfae28
0x0000004035cfae00: 0x01000000005a7a00 0x0000000000f9a880
0x0000004035cfae10: 0x00000000005a5900 <github.com/honeycombio/beeline-go/trace.(*Span).AddField.func1+0x0000000000000000> 0x0000004043dc5688
0x0000004035cfae20: 0x0000004035cfae10 0x0000004035cfae88
0x0000004035cfae30: 0x0000000000d79884 <github.com/honeycombio/hound/cmd/shepherd/app.(*App).getSchema+0x00000000000000f4> 0x0000004043dc5600
0x0000004035cfae40: 0x00000040469ff6c8 0x0000000000000014
0x0000004035cfae50: 0x0000000000ebb3a0 0x0000000001ed2ac8
0x0000004035cfae60: 0x000000400015aae0 0x0000000000000014
0x0000004035cfae70: 0x0000000000ebb3a0 0x0000004043dc5600
0x0000004035cfae80: 0x00000040469ff6c8 0x0000004035cfaf48
0x0000004035cfae90: 0x0000000000d755bc <github.com/honeycombio/hound/cmd/shepherd/app.(*App).batchProcess.func1+0x000000000000043c> 0x0000000001491d40
fatal error: traceback stuck
goroutine 0 [idle]:
runtime.throw({0x11b3553?, 0x1ebf180?})
/usr/local/go/src/runtime/panic.go:1047 +0x40 fp=0x40009ac610 sp=0x40009ac5e0 pc=0x490f0
runtime.gentraceback(0x1f3c7e0?, 0x0?, 0x40009ac9a8?, 0x403cd55520, 0x0, 0x40009ac9a8, 0x40, 0x0, 0x0?, 0x6)
/usr/local/go/src/runtime/traceback.go:488 +0xde4 fp=0x40009ac940 sp=0x40009ac610 pc=0x703f4
runtime.sigprof(0x7abe0, 0x40009acc48?, 0x0?, 0x403cd55520, 0x40009a4000)
/usr/local/go/src/runtime/proc.go:4567 +0xe8 fp=0x40009acbb0 sp=0x40009ac940 pc=0x55618
runtime.sighandler(0x1b?, 0x40009a4000?, 0x40009acc98?, 0x40005e4000?)
/usr/local/go/src/runtime/signal_unix.go:629 +0x574 fp=0x40009acc60 sp=0x40009acbb0 pc=0x60334
runtime.sigtrampgo(0x1b, 0x40009acda0, 0x40009ace20)
/usr/local/go/src/runtime/signal_unix.go:479 +0x164 fp=0x40009acce0 sp=0x40009acc60 pc=0x5f9c4
runtime.sigtramp()
/usr/local/go/src/runtime/sys_linux_arm64.s:471 +0x58 fp=0x40009acda0 sp=0x40009acce0 pc=0x7e228
goroutine 1503875 [running]:
runtime: traceback stuck. pc=0x58e9e0 sp=0x400ffaeda0
stack: frame={sp:0x400ffaeda0, fp:0x400ffaeda0} stack=[0x400ffac000,0x400ffb0000)
0x000000400ffaeca0: 0x0000004046ce2240 0x000000010001f0b8
0x000000400ffaecb0: 0x7300000001243730 0x0000ffff9d953948
0x000000400ffaecc0: 0xc8f36a7a4ff431e7 0x0000000000000007
0x000000400ffaecd0: 0x000000000058db7c <github.com/honeycombio/libhoney-go.(*fieldHolder).AddField+0x000000000000004c> 0x000000400cfa5520
0x000000400ffaece0: 0x000000000058dc00 <github.com/honeycombio/libhoney-go.(*fieldHolder).AddField+0x00000000000000d0> 0x000000400ffaed38
0x000000400ffaecf0: 0x0000000000065708 <runtime.rawstringtmp+0x0000000000000048> 0x000000400ffaed58
0x000000400ffaed00: 0x000000000058e980 <github.com/honeycombio/libhoney-go.(*Event).AddField+0x0000000000000110> 0x0000000000fcae00
0x000000400ffaed10: 0x0000004046ce2240 0x0000004046110d50
0x000000400ffaed20: 0x0000000000000016 0x0100000000000016
0x000000400ffaed30: 0x00000040453eaf68 0x0000000000eccea0
0x000000400ffaed40: 0x000000000058dc50 <github.com/honeycombio/libhoney-go.(*fieldHolder).AddField.func1+0x0000000000000000> 0x00000040453eaf68
0x000000400ffaed50: 0x000000400ffaed40 0x000000400ffaedb8
0x000000400ffaed60: 0x00000000005a5898 <github.com/honeycombio/beeline-go/trace.(*Span).AddField+0x0000000000000158> 0x00000040453eaf60
0x000000400ffaed70: 0x0000004046110d50 0x0000000000000016
0x000000400ffaed80: 0x0000000000eccea0 0x000000000147d320
0x000000400ffaed90: 0x01000000005a5840 0x0000000000f9a880
0x000000400ffaeda0: >0x000000000058e9e0 <github.com/honeycombio/libhoney-go.(*Event).AddField.func1+0x0000000000000000> 0x00000040453eaf8c
0x000000400ffaedb0: 0x000000400ffaeda0 0x000000400ffaee28
0x000000400ffaedc0: 0x00000000005a7a4c <github.com/honeycombio/beeline-go.AddField+0x00000000000000cc> 0x0000ffff9d953948
0x000000400ffaedd0: 0x0000004046110d50 0x000000000147d320
0x000000400ffaede0: 0x0000000000eccea0 0x000000000147d320
0x000000400ffaedf0: 0x000000400ffaee08 0x000000400ffaee28
0x000000400ffaee00: 0x01000000005a7a00 0x0000000000f9a880
0x000000400ffaee10: 0x00000000005a5900 <github.com/honeycombio/beeline-go/trace.(*Span).AddField.func1+0x0000000000000000> 0x00000040453c2b88
0x000000400ffaee20: 0x000000400ffaee10 0x000000400ffaee88
0x000000400ffaee30: 0x0000000000d79a6c <github.com/honeycombio/hound/cmd/shepherd/app.(*App).getSchema+0x00000000000002dc> 0x00000040453c2b00
0x000000400ffaee40: 0x0000004046110d50 0x0000000000000016
0x000000400ffaee50: 0x0000000000eccea0 0x000000000147d320
0x000000400ffaee60: 0x00000040006af0b0 0x0000000000000016
0x000000400ffaee70: 0x0000000000eccea0 0x00000040453c2b00
0x000000400ffaee80: 0x0000004046110d50 0x000000400ffaef48
0x000000400ffaee90: 0x0000000000d755bc <github.com/honeycombio/hound/cmd/shepherd/app.(*App).batchProcess.func1+0x000000000000043c> 0x0000000001491d40
fatal error: traceback stuck
goroutine 0 [idle]:
runtime.throw({0x11b3553?, 0x1ebf180?})
/usr/local/go/src/runtime/panic.go:1047 +0x40 fp=0x400078a610 sp=0x400078a5e0 pc=0x490f0
runtime.gentraceback(0x1f3c7e0?, 0x0?, 0x400078a9a8?, 0x400cfa5520, 0x0, 0x400078a9a8, 0x40, 0x0, 0x0?, 0x6)
/usr/local/go/src/runtime/traceback.go:488 +0xde4 fp=0x400078a940 sp=0x400078a610 pc=0x703f4
runtime.sigprof(0x7abe4, 0x400078ac48?, 0x0?, 0x400cfa5520, 0x4000780000)
/usr/local/go/src/runtime/proc.go:4567 +0xe8 fp=0x400078abb0 sp=0x400078a940 pc=0x55618
runtime.sighandler(0x1b?, 0x4000780000?, 0x400078ac98?, 0x40007821a0?)
/usr/local/go/src/runtime/signal_unix.go:629 +0x574 fp=0x400078ac60 sp=0x400078abb0 pc=0x60334
runtime.sigtrampgo(0x1b, 0x400078ada0, 0x400078ae20)
/usr/local/go/src/runtime/signal_unix.go:479 +0x164 fp=0x400078ace0 sp=0x400078ac60 pc=0x5f9c4
runtime.sigtramp()
/usr/local/go/src/runtime/sys_linux_arm64.s:471 +0x58 fp=0x400078ada0 sp=0x400078ace0 pc=0x7e228
goroutine 16981043 [running]:
go tool addr2line shepherd
0x7abe4
0x7abe0
runtime.morestack_noctxt
/usr/local/go/src/runtime/asm_arm64.s:324
runtime.morestack_noctxt
/usr/local/go/src/runtime/asm_arm64.s:323
https://github.com/golang/go/blob/go1.19/src/runtime/asm_arm64.s#L323-L324
Thanks for the details @lizthegrey !
similarity to https://github.com/golang/go/issues/52829, does this need to not have NOFRAME ie https://go-review.googlesource.com/c/go/+/405482/?
I don't think this is necessary. The ARM32 function contains a call in some configurations, whereas the ARM64 function should not contain a call.
So the signal lands at the entry of morestack_noctxt
or systemstack
. At this point the stack switch hasn't occurred so it should be able to traceback normally. I don't immediately see what is wrong. I'll keep looking.
If you need a workaround you can try comment out the "traceback stuck" throw. The profile sample's stacks may be inaccurate (which probably was the case before that CL) but it probably won't crash.
Need any additional debugging info? Any way that I can help?
We have special cases for morestack and systemstack here, but I don't believe they are relevant because (a) funcID_morestack
only applies to morestack
, not morestack_noctxt
, and (b) we should only hits these if we already switched stacks, which is not the case here.
@lizthegrey is this readily reproducible (ideally OSS, but I'm also curious if you could manually reproduce)? Or only occurs in prod?
It reproduces multiple times per day in all of our environments. Sadly I'm not sure I can reduce to a minimal OSS test case.
One thing I'd be happy to do is to run a version of this that is a branch build that patches go runtime to hang rather than panic when this occurs, then I'd be happy to ssh in and dlv the binary.
Change https://go.dev/cl/424196 mentions this issue: DO NOT SUBMIT: runtime: add dlog to gentraceback
Change https://go.dev/cl/424195 mentions this issue: DO NOT SUBMIT: runtime: dump entire stack
If you are willing to patch the runtime, there are a few changes that would be helpful.
-
https://go.dev/cl/424195: This just expands the stack dump to include the entire stack, which will help see more context. Since this only runs when crashing, it won't have any impact on normal execution.
-
https://go.dev/cl/424196: This adds
debuglog
logs to trace the behavior of gentraceback, which will help us see how to gets into the bad state.debuglog
is a low-overhead in-memory ring buffer log in the runtime, which you must enable withgo build -tags debuglog
[1]. The log is dumped to stderr on crash. Without the build tag, the code is a complete no-op. With the build tag, it is logging every execution ofgentraceback
, so there may be a small performance degradation, but I don't think it will be noticeable in real-world applications.
I think the latter is a bit simpler than attaching with a debugger, since it can track iteration state (previous frames) that will be gone once we decide to crash. Plus you don't need to watch for hung tasks and manually inspect them, but that is an option as well if the patch is concerning.
[1] If adding tags to your build is difficult for some reason, you can also swap the build tag lines at the top of debuglog_off.go and debuglog_on.go.
There's some slew since 1.18.5:
/usr/local/go/src/runtime/traceback.go:47:9: cannot use gp.goid (variable of type int64) as type uint64 in argument to d.u64
I'll fix the patch vs our base go version and try this overnight.
Fresh crash for you with the additional telemetry.
Details
2022/08/17 15:22:56 maxprocs: Updating GOMAXPROCS=4: determined from CPU quota
time="2022-08-17T15:22:56Z" level=info msg="missing POD_NAME and POD_NAMESPACE in env, ok if not retriever"
time="2022-08-17T15:22:56Z" level=info msg="Debug service listening on :6060"
[LaunchDarkly] 2022/08/17 15:22:56 INFO: Starting LaunchDarkly client 5.0.2
[LaunchDarkly] 2022/08/17 15:22:56 INFO: Starting LaunchDarkly streaming connection
[LaunchDarkly] 2022/08/17 15:22:56 INFO: Waiting up to 30000 milliseconds for LaunchDarkly client to start...
[LaunchDarkly] 2022/08/17 15:22:56 INFO: Connecting to LaunchDarkly stream
[LaunchDarkly] 2022/08/17 15:22:56 INFO: LaunchDarkly streaming is active
[LaunchDarkly] 2022/08/17 15:22:56 INFO: Initialized LaunchDarkly client
runtime: traceback stuck. pc=0x12894 sp=0x40009d7840
stack: frame={sp:0x40009d7840, fp:0x40009d7840} stack=[0x40009d6000,0x40009d8000)
0x00000040009d7740: 0x000000000001cadc <runtime.mallocgc+0x000000000000068c> 0x00000040009d77b8
0x00000040009d7750: 0x0000000000474210 <github.com/Shopify/sarama.(*realDecoder).pop+0x0000000000000060> 0x00000040009d77c8
0x00000040009d7760: 0x0000000000000000 0x0000000000000000
0x00000040009d7770: 0x0000000000000700 0x0000000000000004
0x00000040009d7780: 0x00000040033b40c0 0x0000000000007701
0x00000040009d7790: 0x0000000000000000 0x0000000000000000
0x00000040009d77a0: 0x0000000000000000 0x0000004000600000
0x00000040009d77b0: 0x0000004002b77c00 0x00000040009d77f8
0x00000040009d77c0: 0x000000000047703c <github.com/Shopify/sarama.(*RecordBatch).decode+0x000000000000037c> 0x0000004008ec5980
0x00000040009d77d0: 0x0000000000476f7c <github.com/Shopify/sarama.(*RecordBatch).decode+0x00000000000002bc> 0x00000040099ce05f
0x00000040009d77e0: 0x000000000010a2ee <type..eq.[...]interface {}+0x000000000000003e> 0x0000000000000000
0x00000040009d77f0: 0x0000000000000000 0x00000040009d78c8
0x00000040009d7800: 0x00000000004777dc <github.com/Shopify/sarama.(*Records).decode+0x000000000000009c> 0x00000040033b40c0
0x00000040009d7810: 0x0000004009ac5ab9 0x0000000000012894 <cmpbody+0x00000000000000e4>
0x00000040009d7820: 0x0000000000012894 <cmpbody+0x00000000000000e4> 0x00000040009d7898
0x00000040009d7830: 0x000128c50001ce80 0x0000000000012894 <cmpbody+0x00000000000000e4>
0x00000040009d7840: >0x0000000000012894 <cmpbody+0x00000000000000e4> 0x00000000000000dc
0x00000040009d7850: 0x0000000000000000 0x00000040009d7800
0x00000040009d7860: 0x0000000000476da4 <github.com/Shopify/sarama.(*RecordBatch).decode+0x00000000000000e4> 0x00000040009d78b8
0x00000040009d7870: 0x0000000000000000 0x0000000000000000
0x00000040009d7880: 0x0000000000000000 0x0000000000000000
0x00000040009d7890: 0x0000000000000000 0x0000004009ac5ab9
0x00000040009d78a0: 0x0000004008ec5980 0x0000000000000000
0x00000040009d78b0: 0x0000000000000000 0x00000000004772f0 <github.com/Shopify/sarama.(*RecordBatch).decode.func1+0x0000000000000000>
0x00000040009d78c0: 0x0000004008ec5980 0x00000040009d7918
0x00000040009d78d0: 0x000000000045a32c <github.com/Shopify/sarama.(*FetchResponseBlock).decode+0x000000000000033c> 0x0000004002cc3540
0x00000040009d78e0: 0x0000000001c75e50 0x00000040033b40c0
0x00000040009d78f0: 0x000000000045a31c <github.com/Shopify/sarama.(*FetchResponseBlock).decode+0x000000000000032c> 0x0000000000000018
0x00000040009d7900: 0x00000000016a8080 0x00000040005cf601
0x00000040009d7910: 0x00000040027f6300 0x00000040009d79c8
0x00000040009d7920: 0x000000000045acf8 <github.com/Shopify/sarama.(*FetchResponse).decode+0x00000000000002a8> 0x000000400710dad0
0x00000040009d7930: 0x0000000001c75e50 0x00000040033b40c0
fatal error: traceback stuck
goroutine 0 [idle]:
runtime: unexpected return pc for runtime.sigtramp called from 0xffffaac795c0
stack: frame={sp:0x40006cacd0, fp:0x40006cada0} stack=[0x40006c4000,0x40006cc000)
0x00000040006cabd0: 0x00000040006cac28 0x0000000000000000
0x00000040006cabe0: 0x0000000000000000 0x00000040006c2000
0x00000040006cabf0: 0x0000000000000000 0x0000000000000000
0x00000040006cac00: 0x0000000000000000 0x00000040006cada0
0x00000040006cac10: 0x00000040006cae20 0x00000040006cac98
0x00000040006cac20: 0x000000000007d35c <runtime.sigtrampgo+0x000000000000001c> 0x000000400000001b
0x00000040006cac30: 0x00000040006c0000 0x00000040006cac58
0x00000040006cac40: 0x0000004007a6bd40 0x0000000000000000
0x00000040006cac50: 0x0000000000000000 0x0000000000000000
0x00000040006cac60: 0x0000000000000000 0x0000000000000000
0x00000040006cac70: 0x0000000000000000 0x0000000000000000
0x00000040006cac80: 0x0000004007a6bd40 0x00000040006cada0
0x00000040006cac90: 0x00000040006cae20 0x00000040006cacc8
0x00000040006caca0: 0x000000000007cc74 <runtime.sigtramp+0x0000000000000084> 0x000000000000001b
0x00000040006cacb0: 0x00000040006cada0 0x00000040006cae20
0x00000040006cacc0: 0x0000000000000000 0x00000040006cbff0
0x00000040006cacd0: <0x0000ffffaac795c0 0x000000000000001b
0x00000040006cace0: 0x00000040006cada0 0x00000040006cae20
0x00000040006cacf0: 0x000000000004a4fd <runtime.main+0x00000000000002ed> 0x070a091214d6eaae
0x00000040006cad00: 0x226e776f6e6b6e75 0x3464623239303924
0x00000040006cad10: 0x342d356131632d61 0x000000000000000d
0x00000040006cad20: 0x00000000000001dd 0x0000000000000000
0x00000040006cad30: 0x0000000001c5b300 0x0000004007a6bd40
0x00000040006cad40: 0x00000040006cacc8 0x3e66376972bea4d0
0x00000040006cad50: 0xbebbbd41c5d26bf1 0x3f11566aaf25de2c
0x00000040006cad60: 0xbf66c16c16bebd93 0x3fc5555555555555
0x00000040006cad70: 0x3fc7642f0ac54c41 0x3ffd137a1ea75678
0x00000040006cad80: 0xbfca94327912c722 0x0000000000000000
0x00000040006cad90: 0x0000000000000000 0x0000000000000000
0x00000040006cada0: >0x000000000000001b 0x00000000fffffffe
0x00000040006cadb0: 0x0000000000000136 0x0000000000000000
0x00000040006cadc0: 0x0000000000000000 0x0000000000000000
0x00000040006cadd0: 0x0000000000000000 0x0000000000000000
0x00000040006cade0: 0x0000000000000000 0x0000000000000000
0x00000040006cadf0: 0x0000000000000000 0x0000000000000000
0x00000040006cae00: 0x0000000000000000 0x0000000000000000
0x00000040006cae10: 0x0000000000000000 0x0000000000000000
0x00000040006cae20: 0x0000000000000000 0x0000000000000000
0x00000040006cae30: 0x00000040006c4000 0x0000000000000000
0x00000040006cae40: 0x0000000000008000 0x0000000000000000
0x00000040006cae50: 0x0000000000000000 0x0000000000000000
0x00000040006cae60: 0x0000000000000000 0x0000000000000000
0x00000040006cae70: 0x0000000000000000 0x0000000000000000
0x00000040006cae80: 0x0000000000000000 0x0000000000000000
0x00000040006cae90: 0x0000000000000000 0x0000000000000000
runtime.throw({0x1894577?, 0x2a981a0?})
/usr/local/go/src/runtime/panic.go:992 +0x50
runtime.gentraceback(0x2b3d3a0?, 0x0?, 0x40006ca968?, 0x4007a6bd40, 0x0, 0x40006ca968, 0x40, 0x0, 0x0?, 0x6)
/usr/local/go/src/runtime/traceback.go:488 +0xd98
runtime.sigprof(0x79410, 0x40006cac08?, 0x0?, 0x4007a6bd40, 0x40006c0000)
/usr/local/go/src/runtime/proc.go:4507 +0xf0
runtime.sighandler(0x1b?, 0x40006c0000?, 0x40006cac58?, 0x4007a6bd40?)
/usr/local/go/src/runtime/signal_unix.go:613 +0x514
runtime.sigtrampgo(0x1b, 0x40006cada0, 0x40006cae20)
/usr/local/go/src/runtime/signal_unix.go:477 +0x164
runtime.sigtrampgo(0x1b, 0x40006cada0, 0x40006cae20)
<autogenerated>:1 +0x1c
runtime: unexpected return pc for runtime.sigtramp called from 0xffffaac795c0
stack: frame={sp:0x40006cacd0, fp:0x40006cada0} stack=[0x40006c4000,0x40006cc000)
0x00000040006cabd0: 0x00000040006cac28 0x0000000000000000
0x00000040006cabe0: 0x0000000000000000 0x00000040006c2000
0x00000040006cabf0: 0x0000000000000000 0x0000000000000000
0x00000040006cac00: 0x0000000000000000 0x00000040006cada0
0x00000040006cac10: 0x00000040006cae20 0x00000040006cac98
0x00000040006cac20: 0x000000000007d35c <runtime.sigtrampgo+0x000000000000001c> 0x000000400000001b
0x00000040006cac30: 0x00000040006c0000 0x00000040006cac58
0x00000040006cac40: 0x0000004007a6bd40 0x0000000000000000
0x00000040006cac50: 0x0000000000000000 0x0000000000000000
0x00000040006cac60: 0x0000000000000000 0x0000000000000000
0x00000040006cac70: 0x0000000000000000 0x0000000000000000
0x00000040006cac80: 0x0000004007a6bd40 0x00000040006cada0
0x00000040006cac90: 0x00000040006cae20 0x00000040006cacc8
0x00000040006caca0: 0x000000000007cc74 <runtime.sigtramp+0x0000000000000084> 0x000000000000001b
0x00000040006cacb0: 0x00000040006cada0 0x00000040006cae20
0x00000040006cacc0: 0x0000000000000000 0x00000040006cbff0
0x00000040006cacd0: <0x0000ffffaac795c0 0x000000000000001b
0x00000040006cace0: 0x00000040006cada0 0x00000040006cae20
0x00000040006cacf0: 0x000000000004a4fd <runtime.main+0x00000000000002ed> 0x070a091214d6eaae
0x00000040006cad00: 0x226e776f6e6b6e75 0x3464623239303924
0x00000040006cad10: 0x342d356131632d61 0x000000000000000d
0x00000040006cad20: 0x00000000000001dd 0x0000000000000000
0x00000040006cad30: 0x0000000001c5b300 0x0000004007a6bd40
0x00000040006cad40: 0x00000040006cacc8 0x3e66376972bea4d0
0x00000040006cad50: 0xbebbbd41c5d26bf1 0x3f11566aaf25de2c
0x00000040006cad60: 0xbf66c16c16bebd93 0x3fc5555555555555
0x00000040006cad70: 0x3fc7642f0ac54c41 0x3ffd137a1ea75678
0x00000040006cad80: 0xbfca94327912c722 0x0000000000000000
0x00000040006cad90: 0x0000000000000000 0x0000000000000000
0x00000040006cada0: >0x000000000000001b 0x00000000fffffffe
0x00000040006cadb0: 0x0000000000000136 0x0000000000000000
0x00000040006cadc0: 0x0000000000000000 0x0000000000000000
0x00000040006cadd0: 0x0000000000000000 0x0000000000000000
0x00000040006cade0: 0x0000000000000000 0x0000000000000000
0x00000040006cadf0: 0x0000000000000000 0x0000000000000000
0x00000040006cae00: 0x0000000000000000 0x0000000000000000
0x00000040006cae10: 0x0000000000000000 0x0000000000000000
0x00000040006cae20: 0x0000000000000000 0x0000000000000000
0x00000040006cae30: 0x00000040006c4000 0x0000000000000000
0x00000040006cae40: 0x0000000000008000 0x0000000000000000
0x00000040006cae50: 0x0000000000000000 0x0000000000000000
0x00000040006cae60: 0x0000000000000000 0x0000000000000000
0x00000040006cae70: 0x0000000000000000 0x0000000000000000
0x00000040006cae80: 0x0000000000000000 0x0000000000000000
0x00000040006cae90: 0x0000000000000000 0x0000000000000000
runtime.sigtramp()
/usr/local/go/src/runtime/sys_linux_arm64.s:489 +0x84
goroutine 2446 [running]:
There's some slew since 1.18.5:
Oops, apologies, I made this CL based on tip rather than 1.18. Did you get this fixed? If not I can provide another on 1.18.
Fresh crash for you with the additional telemetry.
Did this get built with -tags debuglog
? The is missing, though perhaps just not included in your paste. The log will be output after all of the crash stack traces just before exit.
Did this get built with -tags debuglog? The is missing, though perhaps just not included in your paste. The log will be output after all of the crash stack traces just before exit.
I thought it did, but apparently it's missing them. Does go install
accept tags, or only go build
?
go install -tags debuglog -ldflags "${LD_FLAGS} \
-X github.com/honeycombio/hound/env.buildID=$BUILD_ID \
-X github.com/honeycombio/hound/env.commitHash=$CIRCLE_SHA1" \
./...
Yes, go install
accepts tags.
I also see that the stack dump doesn't include the entire stack either (https://go.dev/cl/424195). I'd guess that your build isn't picking up the patches to the runtime at all (e.g., wrong GOROOT), except that it clearly did when you got a build error...
You could double check the symbols in the binary. If debuglog is enabled, it should have various dlogger
symbols like runtime.(*dlogger).end
, if not those are missing.
oh, whoops, good catch.
Roles pinned:
beagle: 603443
barbet: 603443
that binary is pinned to an older build. the build instructions were correct, prod just didn't have the correct one. egg on my face, sorry, I'll re-try and let you know when we have a crash with the correct build.
Thanks. By the way, do you happen to be using musl libc (/Alpine Linux)? (Is this even a cgo binary?)
I'm investigating #54306 in parallel, presumably an issue with musl. It is probably unrelated, but it would help to know if you are using musl.
Thanks. By the way, do you happen to be using musl libc (/Alpine Linux)? (Is this even a cgo binary?)
I'm investigating #54306 in parallel, presumably an issue with musl. It is probably unrelated, but it would help to know if you are using musl.
No, we run on Ubuntu 22.04 containers with standard libc 2.35-0ubuntu3.1
. No use of cgo
https://gist.github.com/lizthegrey/043da22367603cd24acf82c13742d830
Thanks, I'll take a look! Does this one have debuglog output at the end? (Lots of lines mostly containing gentrace: ...
)
https://gist.github.com/lizthegrey/f5827b5577abf9b0e3c883eeaf748361
Oh, no, there is nothing with "gentrace" at the end.
Hrm, that is quite frustrating.
What I can tell from the stack dump is that the SP/FP we've landed at is certainly not a valid frame pointer. It is hard to tell, but the stack doesn't look obviously corrupt to me. Perhaps the stack is OK, but we computed the wrong next frame pointer.
I'll see if I can put more information in another patch, even if debuglog fails.
New patches, now based on the 1.18 release branch, so they should apply cleanly:
- https://go.dev/cl/424076: Identical to https://go.dev/cl/424195.
- https://go.dev/cl/424077: Similar to https://go.dev/cl/424196. Adds the previous frame to the panic output so we get that even if debuglog is still broken. Also enables debuglog by default to try to get that working. As a result, you should not pass
-tags debuglog
.
Change https://go.dev/cl/424077 mentions this issue: DO NOT SUBMIT: runtime: add dlog to gentraceback
Change https://go.dev/cl/424076 mentions this issue: DO NOT SUBMIT: runtime: dump entire stack
Go ahead and base on 1.19 going forward, I just upgraded our prod environments...
This patchset worked fine though with offset:
patching file src/runtime/traceback.go
patching file src/runtime/debuglog_off.go
patching file src/runtime/debuglog_on.go
patching file src/runtime/proc.go
Hunk #1 succeeded at 4511 (offset 60 lines).
patching file src/runtime/traceback.go
Still not seeing debuglog in output. Are we terminating before debuglog can be printed?
https://gist.github.com/lizthegrey/58174372484922b47c69c147a4c9f1f8
It's possible, though I'm surprised there is no further indication of crash if true. Do you see "stack trace unavailable" at the end? Curiously, I've been using the same patch to investigate #54306 and it is working fine.
On another note, I've noticed that most of the stacks so far seem to be in code around github.com/Shopify/sarama.(*RecordBatch).decode
(and various other sarama
decode
functions) which calls into github.com/klauspost/compress/zstd
functions. Do you know (from profiles or intuition) if this code path is very hot in this application (so crashing here is likely just coincidence), or fairly rare (so perhaps not a coincidence)? There was one crash above in github.com/honeycombio/libhoney-go
, so I lean towards coincidence.
@lizthegrey @prattmic thanks for the information! The most recent one doesn't have morestack or systemstack, so it may be not limited to that.
I'm still puzzled by the unaligned-PC-like values, e.g. 0x11b2b, which is
<runtime/internal/atomic.(*Uint8).And+0x000000000000003b>
(it is not only unaligned but also past the end of the function, so the execution should never get there).
It may be helpful to print the PCs it has unwound through, i.e. the content of pcbuf
, so we can see how it gets there. (SPs would also be good, but they are not stored anywhere so it's hard to print. PCs plus the content of the stack is probably enough.)
@lizthegrey do you have any run without profiling turned on, and if so, is there any crash? It looks to me that the content of the stack is weird so even a non-profiling run might crash. Thanks.
I'm still puzzled by the unaligned-PC-like values, e.g. 0x11b2b
I was confused by those as well. There are some really odd ones like runtime.debugCallWrap.func1
. My guess was that since the values are fairly small, they are probably just random arguments to these functions and not PCs at all.
@lizthegrey do you have any run without profiling turned on, and if so, is there any crash? It looks to me that the content of the stack is weird so even a non-profiling run might crash. Thanks.
No profiling = no crash. We've had copies of some of these binaries running without profiling for months, that have not crashed, when we enabled profiling they immediately started crashing about once per day out of 140 processes.
It's possible, though I'm surprised there is no further indication of crash if true. Do you see "stack trace unavailable" at the end? Curiously, I've been using the same patch to investigate #54306 and it is working fine.
Oddly, no. What's also odd is that the kubectl log
output stops after a small number of goroutines, rather than printing all the way through. I'll see if our fluentbit exporter got the entire thing and report back.
On another note, I've noticed that most of the stacks so far seem to be in code around
github.com/Shopify/sarama.(*RecordBatch).decode
(and various othersarama
decode
functions) which calls intogithub.com/klauspost/compress/zstd
functions. Do you know (from profiles or intuition) if this code path is very hot in this application (so crashing here is likely just coincidence), or fairly rare (so perhaps not a coincidence)? There was one crash above ingithub.com/honeycombio/libhoney-go
, so I lean towards coincidence.
It's a coincidence, sarama/zstd is about 30-40% of the workload of this app, and libhoney-go is 10%.
What's also odd is that the kubectl log output stops after a small number of goroutines
Huh. I'd noticed that your logs stop at the second goroutine header and assumed you were just manually truncating. Is that actually where the output stops?
I've updated https://go.dev/cl/424196 to:
- Print debuglog before the panic tracebacks. Maybe this will get it out?
- Remember and dump the first 32 frames we walk (which is primarily what I want out of debuglog).
- Dump the contents of
pcbuf
.
Could you give it a try?
Huh. I'd noticed that your logs stop at the second goroutine header and assumed you were just manually truncating. Is that actually where the output stops?
It is indeed. on a recent crash, even pulling the logs from fluentbit, they're truncated directly after the second goroutine:
/usr/local/go/src/runtime/sys_linux_arm64.s:471 +0x58 fp=0x400081cda0 sp=0x400081cce0 pc=0x80258
goroutine 469267 [running]:
and then it stops.
I've updated go.dev/cl/424196 to:
- Print debuglog before the panic tracebacks. Maybe this will get it out?
- Remember and dump the first 32 frames we walk (which is primarily what I want out of debuglog).
- Dump the contents of
pcbuf
.Could you give it a try?
Perfect, will give it a try.
Finally got a good dump including debug logs for you. https://gist.github.com/lizthegrey/0ed6b93167f618fa0d09becf975888c4
Awesome, thanks!
Here's what I see:
Caveat: The debuglog is still a bit corrupted, I guess because I printed early while other threads were still writing to it. That's OK though, the frame history has most of what I want.
This trace goes through libhoney.(*fieldHolder).AddField
. See the source here.
previous frame 0 {pc:0x7cdc0, lr:0x24cb8, sp:0x406e1b24c0, fp:0x406e1b24c0}
previous frame 1 {pc:0x24cb8, lr:0x241a4, sp:0x406e1b24c0, fp:0x406e1b24c0}
I'm not certain what these functions are, as they don't appear in the stack dump.
Based on my own build of 1.19, mapassign_faststr+0x114
is the return site from runtime.growWork_faststr
.
The offset from there is 0x24cb8
yields the return site from runtime.morestack_noctxt.abi0
in runtime.evacuate_faststr
.
So 0x7cdc0
is probably in morestack
?
previous frame 2 {pc:0x241a4, lr:0x241a4, sp:0x406e1b24c0, fp:0x406e1b2530}
previous frame 3 {pc:0x241a4, lr:0x59244c, sp:0x406e1b2530, fp:0x406e1b25a0}
0x241a4
is <runtime.mapassign_faststr+0x0000000000000114>
. At the moment, I am not sure why there is an extra frame with pc == lr
. It might be because 0x241a4
is actually an instruction from runtime.tophash
inlined into runtime.mapassign_faststr
.
previous frame 4 {pc:0x59244c, lr:0x592520, sp:0x406e1b25a0, fp:0x406e1b2600}
0x59244c
is <github.com/honeycombio/libhoney-go.(*fieldHolder).AddField+0x000000000000004c>
. Based on my build of libhoney, AddField+0x4c
is the return site from sync.(*RWMutex).Lock
, not runtime.mapassign_faststr
(there is also a call to that later in the function at +0x90
). This seems wrong...
previous frame 5 {pc:0x592520, lr:0x5932b0, sp:0x406e1b2600, fp:0x406e1b2600}
0x0592520
is <github.com/honeycombio/libhoney-go.(*fieldHolder).AddField.func1+0x0000000000000000>
. AddField.func1
is the wrapper for defer f.lock.Unlock()
. This doesn't make sense, because the wrapper does not call the main function, plus AddField.func1+0x0
is the beginning of the function and definitely not a CALL.
Presumably we get stuck here because the SP delta is 0, as SP hasn't been adjusted at function entry.
(I will note that the instruction just before this is JMP github.com/honeycombio/libhoney-go.(*fieldHolder).AddField(SB)
at the end of AddField
, for restarting after morestack
. Still not a CALL, but a typical pc-4
would put us in a different function).
I see lots of oddities here, though I'm not sure yet exactly where the problem is.
@lizthegrey could you help double-check if my compilation of libhoney matches yours by posting the disassembly of github.com/honeycombio/libhoney-go.(*fieldHolder).AddField
and github.com/honeycombio/libhoney-go.(*fieldHolder).AddField.func1
from the actual binary that crashed? (go tool objdump <binary>
, look for TEXT github.com/honeycombio/libhoney-go.(*fieldHolder).AddField
).
Also please send over additional crashes if you get them.
Thanks!
previous frame 0 {pc:0x7cdc0, lr:0x24cb8, sp:0x406e1b24c0, fp:0x406e1b24c0}
previous frame 1 {pc:0x24cb8, lr:0x241a4, sp:0x406e1b24c0, fp:0x406e1b24c0}
previous frame 2 {pc:0x241a4, lr:0x241a4, sp:0x406e1b24c0, fp:0x406e1b2530}
This is weird. It shouldn't be possible to have three frames to have the same SP. And frame 2's PC = LR, which also makes no sense. Do you know what functions are for PC 0x7cdc0 and 0x24cb8? Thanks.
It is also interesting that debuglog doesn't seem to include these frames...
Additional crashes all appended to the same secret gist (see traceback{2,3,4,...})
code for Sarama and Libhoney should be available in OSS -- see https://github.com/honeycombio/libhoney-go and https://github.com/shopify/sarama
I'll post the disassembly as seen in our compiled code.
in the binary (shepherd) that generated tracebacks 1-3:
TEXT github.com/honeycombio/libhoney-go.(*Client).AddField(SB) /mnt/ramdisk/go/pkg/mod/github.com/honeycombio/[email protected]/client.go
client.go:193 0x5913e0 f9400b90 MOVD 16(R28), R16
client.go:193 0x5913e4 eb3063ff CMP R16, RSP
client.go:193 0x5913e8 54000309 BLS 24(PC)
client.go:193 0x5913ec f81c0ffe MOVD.W R30, -64(RSP)
client.go:193 0x5913f0 f81f83fd MOVD R29, -8(RSP)
client.go:193 0x5913f4 d10023fd SUB $8, RSP, R29
client.go:193 0x5913f8 f9002be1 MOVD R1, 80(RSP)
client.go:193 0x5913fc f90027e0 MOVD R0, 72(RSP)
client.go:193 0x591400 f9002fe2 MOVD R2, 88(RSP)
client.go:193 0x591404 f90033e3 MOVD R3, 96(RSP)
client.go:193 0x591408 f90037e4 MOVD R4, 104(RSP)
client.go:194 0x59140c 97fffe7d CALL github.com/honeycombio/libhoney-go.(*Client).ensureTransmission(SB)
client.go:195 0x591410 f94027e0 MOVD 72(RSP), R0
client.go:195 0x591414 97fffefb CALL github.com/honeycombio/libhoney-go.(*Client).ensureBuilder(SB)
client.go:196 0x591418 f94027e1 MOVD 72(RSP), R1
client.go:196 0x59141c f9401021 MOVD 32(R1), R1
client.go:196 0x591420 3980003b MOVB (R1), R27
client.go:196 0x591424 9100e020 ADD $56, R1, R0
client.go:196 0x591428 f9402be1 MOVD 80(RSP), R1
client.go:196 0x59142c f9402fe2 MOVD 88(RSP), R2
client.go:196 0x591430 f94033e3 MOVD 96(RSP), R3
client.go:196 0x591434 f94037e4 MOVD 104(RSP), R4
client.go:196 0x591438 940003f2 CALL github.com/honeycombio/libhoney-go.(*fieldHolder).AddField(SB)
client.go:197 0x59143c a97ffbfd LDP -8(RSP), (R29, R30)
client.go:197 0x591440 910103ff ADD $64, RSP, RSP
client.go:197 0x591444 d65f03c0 RET
client.go:193 0x591448 f90007e0 MOVD R0, 8(RSP)
client.go:193 0x59144c f9000be1 MOVD R1, 16(RSP)
client.go:193 0x591450 f9000fe2 MOVD R2, 24(RSP)
client.go:193 0x591454 f90013e3 MOVD R3, 32(RSP)
client.go:193 0x591458 f90017e4 MOVD R4, 40(RSP)
client.go:193 0x59145c aa1e03e3 MOVD R30, R3
client.go:193 0x591460 97ebae58 CALL runtime.morestack_noctxt.abi0(SB)
client.go:193 0x591464 f94007e0 MOVD 8(RSP), R0
client.go:193 0x591468 f9400be1 MOVD 16(RSP), R1
client.go:193 0x59146c f9400fe2 MOVD 24(RSP), R2
client.go:193 0x591470 f94013e3 MOVD 32(RSP), R3
client.go:193 0x591474 f94017e4 MOVD 40(RSP), R4
client.go:193 0x591478 17ffffda JMP github.com/honeycombio/libhoney-go.(*Client).AddField(SB)
client.go:193 0x59147c 00000000 ?
TEXT github.com/honeycombio/libhoney-go.(*fieldHolder).AddField.func1(SB) /mnt/ramdisk/go/pkg/mod/github.com/honeycombio/[email protected]/libhoney.go
libhoney.go:567 0x592520 f9400b90 MOVD 16(R28), R16
libhoney.go:567 0x592524 eb3063ff CMP R16, RSP
libhoney.go:567 0x592528 54000169 BLS 11(PC)
libhoney.go:567 0x59252c f81e0ffe MOVD.W R30, -32(RSP)
libhoney.go:567 0x592530 f81f83fd MOVD R29, -8(RSP)
libhoney.go:567 0x592534 d10023fd SUB $8, RSP, R29
libhoney.go:567 0x592538 f9401390 MOVD 32(R28), R16
libhoney.go:567 0x59253c b5000130 CBNZ R16, 9(PC)
libhoney.go:567 0x592540 f9400740 MOVD 8(R26), R0
libhoney.go:567 0x592544 97ebdf5b CALL sync.(*RWMutex).Unlock(SB)
libhoney.go:567 0x592548 a97ffbfd LDP -8(RSP), (R29, R30)
libhoney.go:567 0x59254c 910083ff ADD $32, RSP, RSP
libhoney.go:567 0x592550 d65f03c0 RET
libhoney.go:567 0x592554 aa1e03e3 MOVD R30, R3
libhoney.go:567 0x592558 97eba9fa CALL runtime.morestack.abi0(SB)
libhoney.go:567 0x59255c 17fffff1 JMP github.com/honeycombio/libhoney-go.(*fieldHolder).AddField.func1(SB)
libhoney.go:567 0x592560 f9400211 MOVD (R16), R17
libhoney.go:567 0x592564 9100a3f4 ADD $40, RSP, R20
libhoney.go:567 0x592568 eb11029f CMP R17, R20
libhoney.go:567 0x59256c 54fffea1 BNE -11(PC)
libhoney.go:567 0x592570 910023f4 ADD $8, RSP, R20
libhoney.go:567 0x592574 f9000214 MOVD R20, (R16)
libhoney.go:567 0x592578 17fffff2 JMP -14(PC)
libhoney.go:567 0x59257c 00000000 ?
traceback 4 is from a different binary, let me know what disasm you need from that.
0x7cdc0
TEXT github.com/gogo/protobuf/proto.makeTimePtrSliceMarshaler.func2(SB) /mnt/ramdisk/go/pkg/mod/github.com/gogo/[email protected]/proto/table_marshal_gogo.go
[...]
value.go:1616 0x7cdbf8 f1005cff CMP $23, R7
value.go:1616 0x7cdbfc 54000061 BNE 3(PC)
value.go:1617 0x7cdc00 f9400427 MOVD 8(R1), R7
table_marshal_gogo.go:257 0x7cdc04 1400000a JMP 10(PC)
value.go:1619 0x7cdc08 97e41bae CALL reflect.Value.lenNonSlice(SB)
table_marshal_gogo.go:258 0x7cdc0c f9404be1 MOVD 144(RSP), R1
0x24cb8
TEXT hash/crc32.MakeTable(SB) /usr/local/go/src/hash/crc32/crc32.go
[...]
crc32_generic.go:29 0x24cb80 14000005 JMP 5(PC)
crc32.go:137 0x24cb84 a97ffbfd LDP -8(RSP), (R29, R30)
crc32.go:137 0x24cb88 9100c3ff ADD $48, RSP, RSP
crc32.go:137 0x24cb8c d65f03c0 RET
libhoney-go
I've identified why your build output for libhoney-go may be different from mine. this repo is lagging the mainline a bit.
go 1.18
require (
[...]
github.com/honeycombio/libhoney-go v1.15.8
github.com/Shopify/sarama v1.31.0
TEXT github.com/honeycombio/libhoney-go.(*Client).AddField(SB)
This is Client.AddField
. Could you post fieldHolder.AddField
?
Also for 0x7cdc0
, it looks like you posted 0x7cdc00
instead (off by an order of magnitude). Same with 0x24cb8
.
So 0x7cdc0 is probably in morestack
If this is in morestack, I think that makes sense. morestack
itself doesn't have a frame, so that's "frame 0" above. Its caller (evacuate_faststr
) hasn't created a frame yet, so it also doesn't have a frame, which is "frame 1" above.
Frame 1 should have its return address in growWork_faststr
, but somehow that is missing, and directly skip to mapassign_faststr
?
My guess is that a profiling signal lands in morestack
but before the stack switch. The jumpstack logic doesn't kick in as it hasn't switched. At this point PC=morestack, LR=caller of morestack, i.e. evacuate_faststr
. And it unwinds a frame normally.
At the second frame, nothing sets the LR specially, so at this point https://cs.opensource.google/go/go/+/master:src/runtime/traceback.go;l=228 it just loads the LR from the stack. But evacuate_faststr
hasn't actually saved LR on stack, so it loads the LR of its caller, which is the LR of growWork_faststr, which is
mapassgn_faststr`. I think that explains the missing frame. I think the logic of the code assumes only the last frame can be frameless (the n==0 check), all following frames have LR saved on stack, so it assumes it is safe to load. But the morestack-but-havevn't-switched case breaks that logic...
morestack
writes SP, so it should have SPWRITE bit set. If the jump stack logic doesn't kick in, it should stop here https://cs.opensource.google/go/go/+/master:src/runtime/traceback.go;l=201 . Unfortunate for profiling, but doesn't crash.
However, morestack_noctxt
doesn't write SP, so if a signal lands there, it tries to unwind normally... Maybe a stop-gap fix is to mark morestack_noctxt
SPWRITE...
TEXT github.com/honeycombio/libhoney-go.(*Client).AddField(SB)
This is
Client.AddField
. Could you postfieldHolder.AddField
?Also for
0x7cdc0
, it looks like you posted0x7cdc00
instead (off by an order of magnitude). Same with0x24cb8
.
Oh, whoops, sorry. here you go. (shepherd binary, tracebacks 1-3)
TEXT github.com/honeycombio/libhoney-go.(*fieldHolder).AddField(SB) /mnt/ramdisk/go/pkg/mod/github.com/honeycombio/[email protected]/libhoney.go
libhoney.go:565 0x592400 f9400b90 MOVD 16(R28), R16
libhoney.go:565 0x592404 eb3063ff CMP R16, RSP
libhoney.go:565 0x592408 54000729 BLS 57(PC)
libhoney.go:565 0x59240c f81a0ffe MOVD.W R30, -96(RSP)
libhoney.go:565 0x592410 f81f83fd MOVD R29, -8(RSP)
libhoney.go:565 0x592414 d10023fd SUB $8, RSP, R29
libhoney.go:565 0x592418 f90043e3 MOVD R3, 128(RSP)
libhoney.go:565 0x59241c f9002bff MOVD ZR, 80(RSP)
libhoney.go:565 0x592420 f90037e0 MOVD R0, 104(RSP)
libhoney.go:565 0x592424 f9003fe2 MOVD R2, 120(RSP)
libhoney.go:565 0x592428 f9003be1 MOVD R1, 112(RSP)
libhoney.go:568 0x59242c f9001fe3 MOVD R3, 56(RSP)
libhoney.go:568 0x592430 f90047e4 MOVD R4, 136(RSP)
libhoney.go:565 0x592434 3900bfff MOVB ZR, 47(RSP)
libhoney.go:566 0x592438 3980001b MOVB (R0), R27
libhoney.go:566 0x59243c 91002005 ADD $8, R0, R5
libhoney.go:566 0x592440 f9001be5 MOVD R5, 48(RSP)
libhoney.go:566 0x592444 aa0503e0 MOVD R5, R0
libhoney.go:566 0x592448 97ebdef6 CALL sync.(*RWMutex).Lock(SB)
libhoney.go:567 0x59244c a9047fff STP (ZR, ZR), 64(RSP)
libhoney.go:567 0x592450 90000001 ADRP 0(PC), R1
libhoney.go:567 0x592454 91148021 ADD $1312, R1, R1
libhoney.go:567 0x592458 f90023e1 MOVD R1, 64(RSP)
libhoney.go:567 0x59245c f9401be1 MOVD 48(RSP), R1
libhoney.go:567 0x592460 f90027e1 MOVD R1, 72(RSP)
libhoney.go:567 0x592464 910103e1 ADD $64, RSP, R1
libhoney.go:567 0x592468 f9002be1 MOVD R1, 80(RSP)
libhoney.go:567 0x59246c b24003e1 ORR $1, ZR, R1
libhoney.go:567 0x592470 3900bfe1 MOVB R1, 47(RSP)
libhoney.go:568 0x592474 f94037e1 MOVD 104(RSP), R1
libhoney.go:568 0x592478 f9400021 MOVD (R1), R1
libhoney.go:568 0x59247c 90008e80 ADRP 18677760(PC), R0
libhoney.go:568 0x592480 91210000 ADD $2112, R0, R0
libhoney.go:568 0x592484 f9403be2 MOVD 112(RSP), R2
libhoney.go:568 0x592488 f9403fe3 MOVD 120(RSP), R3
libhoney.go:568 0x59248c 97ea4701 CALL runtime.mapassign_faststr(SB)
libhoney.go:568 0x592490 f9401fe1 MOVD 56(RSP), R1
libhoney.go:568 0x592494 f9000001 MOVD R1, (R0)
libhoney.go:568 0x592498 f001429b ADRP 42283008(PC), R27
libhoney.go:568 0x59249c 9124837b ADD $2336, R27, R27
libhoney.go:568 0x5924a0 b9400361 MOVWU (R27), R1
libhoney.go:568 0x5924a4 35000081 CBNZW R1, 4(PC)
libhoney.go:568 0x5924a8 f94047e1 MOVD 136(RSP), R1
libhoney.go:568 0x5924ac f9000401 MOVD R1, 8(R0)
libhoney.go:568 0x5924b0 14000004 JMP 4(PC)
libhoney.go:568 0x5924b4 91002002 ADD $8, R0, R2
libhoney.go:568 0x5924b8 f94047e3 MOVD 136(RSP), R3
libhoney.go:568 0x5924bc 97ebb341 CALL runtime.gcWriteBarrier(SB)
libhoney.go:569 0x5924c0 3900bfff MOVB ZR, 47(RSP)
libhoney.go:569 0x5924c4 f9402bfa MOVD 80(RSP), R26
libhoney.go:569 0x5924c8 f9400340 MOVD (R26), R0
libhoney.go:569 0x5924cc d63f0000 CALL (R0)
libhoney.go:569 0x5924d0 a97ffbfd LDP -8(RSP), (R29, R30)
libhoney.go:569 0x5924d4 910183ff ADD $96, RSP, RSP
libhoney.go:569 0x5924d8 d65f03c0 RET
libhoney.go:569 0x5924dc 97eadced CALL runtime.deferreturn(SB)
libhoney.go:569 0x5924e0 a97ffbfd LDP -8(RSP), (R29, R30)
libhoney.go:569 0x5924e4 910183ff ADD $96, RSP, RSP
libhoney.go:569 0x5924e8 d65f03c0 RET
libhoney.go:565 0x5924ec f90007e0 MOVD R0, 8(RSP)
libhoney.go:565 0x5924f0 f9000be1 MOVD R1, 16(RSP)
libhoney.go:565 0x5924f4 f9000fe2 MOVD R2, 24(RSP)
libhoney.go:565 0x5924f8 f90013e3 MOVD R3, 32(RSP)
libhoney.go:565 0x5924fc f90017e4 MOVD R4, 40(RSP)
libhoney.go:565 0x592500 aa1e03e3 MOVD R30, R3
libhoney.go:565 0x592504 97ebaa2f CALL runtime.morestack_noctxt.abi0(SB)
libhoney.go:565 0x592508 f94007e0 MOVD 8(RSP), R0
libhoney.go:565 0x59250c f9400be1 MOVD 16(RSP), R1
libhoney.go:565 0x592510 f9400fe2 MOVD 24(RSP), R2
libhoney.go:565 0x592514 f94013e3 MOVD 32(RSP), R3
libhoney.go:565 0x592518 f94017e4 MOVD 40(RSP), R4
libhoney.go:565 0x59251c 17ffffb9 JMP github.com/honeycombio/libhoney-go.(*fieldHolder).AddField(SB)
TEXT runtime.morestack_noctxt.abi0(SB) /usr/local/go/src/runtime/asm_arm64.s
asm_arm64.s:323 0x7cdc0 2a1f03fa MOVW ZR, R26
asm_arm64.s:324 0x7cdc4 17ffffdf JMP runtime.morestack.abi0(SB)
asm_arm64.s:324 0x7cdc8 00000000 ?
asm_arm64.s:324 0x7cdcc 00000000 ?
TEXT runtime.evacuate_faststr(SB) /usr/local/go/src/runtime/map_faststr.go
map_faststr.go:402 0x24900 f9400b90 MOVD 16(R28), R16
map_faststr.go:402 0x24904 d100c3f1 SUB $48, RSP, R17
map_faststr.go:402 0x24908 eb10023f CMP R16, R17
map_faststr.go:402 0x2490c 54001cc9 BLS 230(PC)
map_faststr.go:402 0x24910 f8150ffe MOVD.W R30, -176(RSP)
map_faststr.go:402 0x24914 f81f83fd MOVD R29, -8(RSP)
map_faststr.go:402 0x24918 d10023fd SUB $8, RSP, R29
map_faststr.go:403 0x2491c 39402424 MOVBU 9(R1), R4
map.go:1118 0x24920 d1000485 SUB $1, R4, R5
map.go:1111 0x24924 39402026 MOVBU 8(R1), R6
map.go:1111 0x24928 927d00c6 AND $8, R6, R6
map.go:1111 0x2492c 6b1f00df CMPW ZR, R6
map.go:1120 0x24930 9a851084 CSEL NE, R4, R5, R4
map_faststr.go:403 0x24934 f9400c25 MOVD 24(R1), R5
map_faststr.go:403 0x24938 7940a406 MOVHU 82(R0), R6
map_faststr.go:404 0x2493c d503201f NOOP
map_faststr.go:405 0x24940 d503201f NOOP
stubs.go:18 0x24944 9b061445 MADD R6, R5, R2, R5
map.go:186 0x24948 92401484 AND $63, R4, R4
map.go:186 0x2494c b24003e6 ORR $1, ZR, R6
map.go:186 0x24950 9ac420c4 LSL R4, R6, R4
map.go:204 0x24954 394000a6 MOVBU (R5), R6
map.go:205 0x24958 d10008c6 SUB $2, R6, R6
map.go:205 0x2495c d3401cc6 UBFX $0, R6, $8, R6
map.go:205 0x24960 71000cdf CMPW $3, R6
map.go:1117 0x24964 540003e3 BCC 31(PC)
map_faststr.go:410 0x24968 a906ffff STP (ZR, ZR), 104(RSP)
map_faststr.go:410 0x2496c a907ffff STP (ZR, ZR), 120(RSP)
map_faststr.go:410 0x24970 a908ffff STP (ZR, ZR), 136(RSP)
map_faststr.go:410 0x24974 a909ffff STP (ZR, ZR), 152(RSP)
map_faststr.go:412 0x24978 f9400826 MOVD 16(R1), R6
map_faststr.go:412 0x2497c 7940a407 MOVHU 82(R0), R7
stubs.go:18 0x24980 9b071846 MADD R7, R6, R2, R6
map_faststr.go:412 0x24984 f90037e6 MOVD R6, 104(RSP)
map_faststr.go:413 0x24988 f94037e6 MOVD 104(RSP), R6
stubs.go:18 0x2498c 910020c7 ADD $8, R6, R7
map_faststr.go:413 0x24990 f9003fe7 MOVD R7, 120(RSP)
stubs.go:18 0x24994 910220c6 ADD $136, R6, R6
map_faststr.go:414 0x24998 f90043e6 MOVD R6, 128(RSP)
map.go:1111 0x2499c 39402026 MOVBU 8(R1), R6
map_faststr.go:416 0x249a0 37180166 TBNZ $3, R6, 11(PC)
map_faststr.go:420 0x249a4 f9400826 MOVD 16(R1), R6
map_faststr.go:420 0x249a8 8b020087 ADD R2, R4, R7
map_faststr.go:420 0x249ac 7940a408 MOVHU 82(R0), R8
stubs.go:18 0x249b0 9b071906 MADD R7, R6, R8, R6
map_faststr.go:420 0x249b4 f90047e6 MOVD R6, 136(RSP)
map_faststr.go:421 0x249b8 f94047e6 MOVD 136(RSP), R6
stubs.go:18 0x249bc 910020c7 ADD $8, R6, R7
map_faststr.go:421 0x249c0 f9004fe7 MOVD R7, 152(RSP)
stubs.go:18 0x249c4 910220c6 ADD $136, R6, R6
map_faststr.go:422 0x249c8 f90053e6 MOVD R6, 160(RSP)
map_faststr.go:402 0x249cc f90067e2 MOVD R2, 200(RSP)
map.go:186 0x249d0 f9001fe4 MOVD R4, 56(RSP)
map_faststr.go:402 0x249d4 f9005fe0 MOVD R0, 184(RSP)
map_faststr.go:402 0x249d8 f90063e1 MOVD R1, 192(RSP)
map_faststr.go:402 0x249dc 14000010 JMP 16(PC)
map_faststr.go:482 0x249e0 f9401023 MOVD 32(R1), R3
map_faststr.go:482 0x249e4 eb03005f CMP R3, R2
map_faststr.go:482 0x249e8 540000c1 BNE 6(PC)
map_faststr.go:483 0x249ec aa0403e2 MOVD R4, R2
map_faststr.go:402 0x249f0 aa0003e3 MOVD R0, R3
map_faststr.go:483 0x249f4 aa0103e0 MOVD R1, R0
map_faststr.go:483 0x249f8 aa0303e1 MOVD R3, R1
map_faststr.go:483 0x249fc 97fff385 CALL runtime.advanceEvacuationMark(SB)
map_faststr.go:485 0x24a00 a97ffbfd LDP -8(RSP), (R29, R30)
map_faststr.go:485 0x24a04 9102c3ff ADD $176, RSP, RSP
map_faststr.go:485 0x24a08 d65f03c0 RET
map.go:209 0x24a0c 7940a406 MOVHU 82(R0), R6
stubs.go:18 0x24a10 8b0600a6 ADD R6, R5, R6
stubs.go:18 0x24a14 d10020c6 SUB $8, R6, R6
map.go:209 0x24a18 f94000c5 MOVD (R6), R5
map_faststr.go:425 0x24a1c b4000105 CBZ R5, 8(PC)
map_faststr.go:425 0x24a20 f90033e5 MOVD R5, 96(RSP)
map_faststr.go:426 0x24a24 d503201f NOOP
map_faststr.go:427 0x24a28 d503201f NOOP
stubs.go:18 0x24a2c 910020a6 ADD $8, R5, R6
stubs.go:18 0x24a30 910220a7 ADD $136, R5, R7
stubs.go:18 0x24a34 d2800003 MOVD $0, R3
map_faststr.go:428 0x24a38 14000018 JMP 24(PC)
map_faststr.go:472 0x24a3c 39402023 MOVBU 8(R1), R3
map_faststr.go:472 0x24a40 370ffd03 TBNZ $1, R3, -24(PC)
map_faststr.go:472 0x24a44 f9402003 MOVD 64(R0), R3
map_faststr.go:472 0x24a48 f9400463 MOVD 8(R3), R3
map_faststr.go:472 0x24a4c b4fffca3 CBZ R3, -27(PC)
map_faststr.go:473 0x24a50 f9400c23 MOVD 24(R1), R3
map_faststr.go:473 0x24a54 7940a404 MOVHU 82(R0), R4
map_faststr.go:476 0x24a58 d503201f NOOP
map_faststr.go:477 0x24a5c d1002085 SUB $8, R4, R5
stubs.go:18 0x24a60 9b020c83 MADD R2, R3, R4, R3
stubs.go:18 0x24a64 91002063 ADD $8, R3, R3
map_faststr.go:478 0x24a68 aa0303e0 MOVD R3, R0
map_faststr.go:478 0x24a6c aa0503e1 MOVD R5, R1
map_faststr.go:478 0x24a70 9400015c CALL runtime.memclrHasPointers(SB)
map_faststr.go:483 0x24a74 f9405fe0 MOVD 184(RSP), R0
map_faststr.go:482 0x24a78 f94063e1 MOVD 192(RSP), R1
map_faststr.go:482 0x24a7c f94067e2 MOVD 200(RSP), R2
map_faststr.go:483 0x24a80 f9401fe4 MOVD 56(RSP), R4
map_faststr.go:478 0x24a84 17ffffd7 JMP -41(PC)
map_faststr.go:428 0x24a88 91000463 ADD $1, R3, R3
map_faststr.go:428 0x24a8c 39414409 MOVBU 81(R0), R9
stubs.go:18 0x24a90 910040c6 ADD $16, R6, R6
stubs.go:18 0x24a94 8b0900e7 ADD R9, R7, R7
map_faststr.go:428 0x24a98 f100207f CMP $8, R3
map_faststr.go:428 0x24a9c 54fffb8a BGE -36(PC)
map_faststr.go:429 0x24aa0 386368a8 MOVBU (R5)(R3), R8
map.go:112 0x24aa4 aa0803e9 MOVD R8, R9
map.go:112 0x24aa8 7100053f CMPW $1, R9
map_faststr.go:430 0x24aac 54000088 BHI 4(PC)
map_faststr.go:431 0x24ab0 b27e03e8 ORR $4, ZR, R8
map_faststr.go:431 0x24ab4 382368a8 MOVB R8, (R5)(R3)
map_faststr.go:432 0x24ab8 17fffff4 JMP -12(PC)
map_faststr.go:434 0x24abc 7100153f CMPW $5, R9
map_faststr.go:434 0x24ac0 54000e83 BCC 116(PC)
map_faststr.go:428 0x24ac4 f9001be3 MOVD R3, 48(RSP)
map_faststr.go:429 0x24ac8 3900bfe8 MOVB R8, 47(RSP)
map_faststr.go:428 0x24acc f9002be7 MOVD R7, 80(RSP)
map_faststr.go:459 0x24ad0 f90027e6 MOVD R6, 72(RSP)
map.go:1111 0x24ad4 39402029 MOVBU 8(R1), R9
map_faststr.go:438 0x24ad8 36180069 TBZ $3, R9, 3(PC)
map_faststr.go:438 0x24adc d2800009 MOVD $0, R9
map_faststr.go:438 0x24ae0 14000015 JMP 21(PC)
map_faststr.go:441 0x24ae4 f940241a MOVD 72(R0), R26
map_faststr.go:441 0x24ae8 f9400342 MOVD (R26), R2
map_faststr.go:441 0x24aec b9400c23 MOVWU 12(R1), R3
map_faststr.go:441 0x24af0 aa0603e0 MOVD R6, R0
map_faststr.go:441 0x24af4 aa0303e1 MOVD R3, R1
map_faststr.go:441 0x24af8 d63f0040 CALL (R2)
map_faststr.go:442 0x24afc f9401fe2 MOVD 56(RSP), R2
map_faststr.go:442 0x24b00 8a020003 AND R2, R0, R3
map_faststr.go:442 0x24b04 eb1f007f CMP ZR, R3
map_faststr.go:442 0x24b08 9a9f07e3 CSET NE, R3
map_faststr.go:461 0x24b0c f9405fe0 MOVD 184(RSP), R0
map_faststr.go:451 0x24b10 f94063e1 MOVD 192(RSP), R1
map_faststr.go:482 0x24b14 f94067e2 MOVD 200(RSP), R2
map_faststr.go:442 0x24b18 f9401fe4 MOVD 56(RSP), R4
map_faststr.go:447 0x24b1c f94033e5 MOVD 96(RSP), R5
map_faststr.go:459 0x24b20 f94027e6 MOVD 72(RSP), R6
map_faststr.go:461 0x24b24 f9402be7 MOVD 80(RSP), R7
map_faststr.go:456 0x24b28 3940bfe8 MOVBU 47(RSP), R8
map_faststr.go:447 0x24b2c aa0303e9 MOVD R3, R9
map_faststr.go:447 0x24b30 f9401be3 MOVD 48(RSP), R3
map_faststr.go:447 0x24b34 9100092a ADD $2, R9, R10
map_faststr.go:447 0x24b38 382368aa MOVB R10, (R5)(R3)
map_faststr.go:448 0x24b3c d3401d2a UBFX $0, R9, $8, R10
map_faststr.go:448 0x24b40 f100095f CMP $2, R10
map_faststr.go:448 0x24b44 54000a02 BCS 80(PC)
map_faststr.go:448 0x24b48 d37b1d29 UBFIZ $5, R9, $8, R9
map_faststr.go:448 0x24b4c 9101a3ea ADD $104, RSP, R10
map_faststr.go:448 0x24b50 8b09014b ADD R9, R10, R11
map_faststr.go:448 0x24b54 f9002feb MOVD R11, 88(RSP)
map_faststr.go:450 0x24b58 f940056c MOVD 8(R11), R12
map_faststr.go:450 0x24b5c f100219f CMP $8, R12
map_faststr.go:450 0x24b60 54000381 BNE 28(PC)
map_faststr.go:448 0x24b64 f90023e9 MOVD R9, 64(RSP)
map_faststr.go:451 0x24b68 f8696942 MOVD (R10)(R9), R2
map_faststr.go:402 0x24b6c aa0003e3 MOVD R0, R3
map_faststr.go:451 0x24b70 aa0103e0 MOVD R1, R0
map_faststr.go:451 0x24b74 aa0303e1 MOVD R3, R1
map_faststr.go:451 0x24b78 97ffe8c6 CALL runtime.(*hmap).newoverflow(SB)
map_faststr.go:451 0x24b7c f94023e3 MOVD 64(RSP), R3
map_faststr.go:451 0x24b80 9101a3e4 ADD $104, RSP, R4
map_faststr.go:451 0x24b84 f8236880 MOVD R0, (R4)(R3)
map_faststr.go:452 0x24b88 f9402fe5 MOVD 88(RSP), R5
map_faststr.go:452 0x24b8c f90004bf MOVD ZR, 8(R5)
stubs.go:18 0x24b90 91002006 ADD $8, R0, R6
map_faststr.go:453 0x24b94 f90008a6 MOVD R6, 16(R5)
stubs.go:18 0x24b98 91022006 ADD $136, R0, R6
map_faststr.go:454 0x24b9c f9000ca6 MOVD R6, 24(R5)
map_faststr.go:461 0x24ba0 f9405fe0 MOVD 184(RSP), R0
map.go:1111 0x24ba4 f94063e1 MOVD 192(RSP), R1
map_faststr.go:482 0x24ba8 f94067e2 MOVD 200(RSP), R2
map_faststr.go:442 0x24bac f9401fe4 MOVD 56(RSP), R4
map_faststr.go:429 0x24bb0 f94033e5 MOVD 96(RSP), R5
map_faststr.go:459 0x24bb4 f94027e6 MOVD 72(RSP), R6
map_faststr.go:461 0x24bb8 f9402be7 MOVD 80(RSP), R7
map_faststr.go:456 0x24bbc 3940bfe8 MOVBU 47(RSP), R8
map_faststr.go:456 0x24bc0 aa0303e9 MOVD R3, R9
map_faststr.go:448 0x24bc4 9101a3ea ADD $104, RSP, R10
map_faststr.go:456 0x24bc8 f9402feb MOVD 88(RSP), R11
map_faststr.go:428 0x24bcc f9401be3 MOVD 48(RSP), R3
map_faststr.go:456 0x24bd0 f8696949 MOVD (R10)(R9), R9
map_faststr.go:456 0x24bd4 3980013b MOVB (R9), R27
map_faststr.go:456 0x24bd8 f940056a MOVD 8(R11), R10
map_faststr.go:456 0x24bdc 9240094a AND $7, R10, R10
map_faststr.go:456 0x24be0 382a6928 MOVB R8, (R9)(R10)
map_faststr.go:459 0x24be4 f9400968 MOVD 16(R11), R8
map_faststr.go:459 0x24be8 f94000c9 MOVD (R6), R9
map_faststr.go:459 0x24bec f94004ca MOVD 8(R6), R10
map_faststr.go:459 0x24bf0 f900050a MOVD R10, 8(R8)
map_faststr.go:459 0x24bf4 b0016e1b ADRP 47976448(PC), R27
map_faststr.go:459 0x24bf8 9124837b ADD $2336, R27, R27
map_faststr.go:459 0x24bfc b940036a MOVWU (R27), R10
map_faststr.go:459 0x24c00 3500006a CBNZW R10, 3(PC)
map_faststr.go:459 0x24c04 f9000109 MOVD R9, (R8)
map_faststr.go:459 0x24c08 14000004 JMP 4(PC)
map_faststr.go:459 0x24c0c aa0803e2 MOVD R8, R2
map_faststr.go:459 0x24c10 aa0903e3 MOVD R9, R3
map_faststr.go:459 0x24c14 9401696b CALL runtime.gcWriteBarrier(SB)
map_faststr.go:461 0x24c18 f9401c03 MOVD 56(R0), R3
map_faststr.go:461 0x24c1c f9400d61 MOVD 24(R11), R1
map_faststr.go:461 0x24c20 aa0703e2 MOVD R7, R2
map_faststr.go:461 0x24c24 aa0303e0 MOVD R3, R0
map_faststr.go:461 0x24c28 9400002a CALL runtime.typedmemmove(SB)
map_faststr.go:462 0x24c2c f9402fe3 MOVD 88(RSP), R3
map_faststr.go:462 0x24c30 f9400464 MOVD 8(R3), R4
map_faststr.go:462 0x24c34 91000484 ADD $1, R4, R4
map_faststr.go:462 0x24c38 f9000464 MOVD R4, 8(R3)
map_faststr.go:467 0x24c3c f9400864 MOVD 16(R3), R4
stubs.go:18 0x24c40 91004084 ADD $16, R4, R4
map_faststr.go:467 0x24c44 f9000864 MOVD R4, 16(R3)
map_faststr.go:468 0x24c48 f9400c64 MOVD 24(R3), R4
map_faststr.go:468 0x24c4c f9405fe1 MOVD 184(RSP), R1
map_faststr.go:468 0x24c50 39414425 MOVBU 81(R1), R5
stubs.go:18 0x24c54 8b050084 ADD R5, R4, R4
map_faststr.go:468 0x24c58 f9000c64 MOVD R4, 24(R3)
map_faststr.go:428 0x24c5c aa0103e0 MOVD R1, R0
map.go:1111 0x24c60 f94063e1 MOVD 192(RSP), R1
map_faststr.go:482 0x24c64 f94067e2 MOVD 200(RSP), R2
map_faststr.go:428 0x24c68 f9401be3 MOVD 48(RSP), R3
map_faststr.go:442 0x24c6c f9401fe4 MOVD 56(RSP), R4
map_faststr.go:429 0x24c70 f94033e5 MOVD 96(RSP), R5
stubs.go:18 0x24c74 f94027e6 MOVD 72(RSP), R6
stubs.go:18 0x24c78 f9402be7 MOVD 80(RSP), R7
stubs.go:18 0x24c7c b27e03e8 ORR $4, ZR, R8
map_faststr.go:468 0x24c80 17ffff82 JMP -126(PC)
map_faststr.go:448 0x24c84 aa0a03e0 MOVD R10, R0
map_faststr.go:448 0x24c88 b27f03e1 ORR $2, ZR, R1
map_faststr.go:448 0x24c8c 94016a11 CALL runtime.panicIndex(SB)
map_faststr.go:435 0x24c90 9000cec0 ADRP 27099136(PC), R0
map_faststr.go:435 0x24c94 91030000 ADD $192, R0, R0
map_faststr.go:435 0x24c98 d28001a1 MOVD $13, R1
map_faststr.go:435 0x24c9c 940097dd CALL runtime.throw(SB)
map_faststr.go:435 0x24ca0 d503201f NOOP
map_faststr.go:402 0x24ca4 f90007e0 MOVD R0, 8(RSP)
map_faststr.go:402 0x24ca8 f9000be1 MOVD R1, 16(RSP)
map_faststr.go:402 0x24cac f9000fe2 MOVD R2, 24(RSP)
map_faststr.go:402 0x24cb0 aa1e03e3 MOVD R30, R3
map_faststr.go:402 0x24cb4 94016043 CALL runtime.morestack_noctxt.abi0(SB)
map_faststr.go:402 0x24cb8 f94007e0 MOVD 8(RSP), R0
map_faststr.go:402 0x24cbc f9400be1 MOVD 16(RSP), R1
map_faststr.go:402 0x24cc0 f9400fe2 MOVD 24(RSP), R2
map_faststr.go:402 0x24cc4 17ffff0f JMP runtime.evacuate_faststr(SB)
map_faststr.go:402 0x24cc8 00000000 ?
map_faststr.go:402 0x24ccc 00000000 ?
Thanks! Could you also see the disassembly for 0x7d000
and 0x355e44
in traceback 4?
TEXT runtime.morestack_noctxt.abi0(SB) /usr/local/go/src/runtime/asm_arm64.s
asm_arm64.s:323 0x7d000 2a1f03fa MOVW ZR, R26
asm_arm64.s:324 0x7d004 17ffffdf JMP runtime.morestack.abi0(SB)
asm_arm64.s:324 0x7d008 00000000 ?
asm_arm64.s:324 0x7d00c 00000000 ?
TEXT github.com/klauspost/compress/zstd.(*frameDec).checkCRC(SB) /mnt/ramdisk/go/pkg/mod/github.com/klauspost/[email protected]/zstd/framedec.go
framedec.go:297 0x355d40 f9400b90 MOVD 16(R28), R16
framedec.go:297 0x355d44 eb3063ff CMP R16, RSP
framedec.go:297 0x355d48 54000789 BLS 60(PC)
framedec.go:297 0x355d4c f81c0ffe MOVD.W R30, -64(RSP)
framedec.go:297 0x355d50 f81f83fd MOVD R29, -8(RSP)
framedec.go:297 0x355d54 d10023fd SUB $8, RSP, R29
framedec.go:298 0x355d58 39482002 MOVBU 520(R0), R2
framedec.go:298 0x355d5c 36000642 TBZ $0, R2, 50(PC)
framedec.go:297 0x355d60 f90027e0 MOVD R0, 72(RSP)
framedec.go:303 0x355d64 f940e802 MOVD 464(R0), R2
framedec.go:303 0x355d68 f940ec03 MOVD 472(R0), R3
framedec.go:303 0x355d6c f9401442 MOVD 40(R2), R2
framedec.go:303 0x355d70 b27e03e1 ORR $4, ZR, R1
framedec.go:303 0x355d74 aa0303e0 MOVD R3, R0
framedec.go:303 0x355d78 d63f0040 CALL (R2)
framedec.go:304 0x355d7c b50004a3 CBNZ R3, 37(PC)
framedec.go:309 0x355d80 f94027e2 MOVD 72(RSP), R2
framedec.go:309 0x355d84 3940e043 MOVBU 56(R2), R3
framedec.go:309 0x355d88 370003a3 TBNZ $0, R3, 29(PC)
bytes.go:20 0x355d8c f90017e1 MOVD R1, 40(RSP)
bytes.go:20 0x355d90 f9001be0 MOVD R0, 48(RSP)
framedec.go:313 0x355d94 b90027ff MOVW ZR, 36(RSP)
framedec.go:314 0x355d98 f9402040 MOVD 64(R2), R0
framedec.go:314 0x355d9c 97ff8c75 CALL github.com/klauspost/compress/zstd/internal/xxhash.(*Digest).Sum64(SB)
framedec.go:319 0x355da0 b90027e0 MOVW R0, 36(RSP)
bytes.go:20 0x355da4 f94017e1 MOVD 40(RSP), R1
bytes.go:20 0x355da8 f100103f CMP $4, R1
bytes.go:20 0x355dac 540000c1 BNE 6(PC)
bytes.go:20 0x355db0 910093e0 ADD $36, RSP, R0
bytes.go:20 0x355db4 f9401be1 MOVD 48(RSP), R1
bytes.go:20 0x355db8 b27e03e2 ORR $4, ZR, R2
bytes.go:20 0x355dbc 97f2f341 CALL runtime.memequal(SB)
framedec.go:321 0x355dc0 37000140 TBNZ $0, R0, 10(PC)
framedec.go:325 0x355dc4 f0013ffb ADRP 41938944(PC), R27
framedec.go:325 0x355dc8 911fc37b ADD $2032, R27, R27
framedec.go:325 0x355dcc f9400360 MOVD (R27), R0
framedec.go:325 0x355dd0 f0013ffb ADRP 41938944(PC), R27
framedec.go:325 0x355dd4 911fe37b ADD $2040, R27, R27
framedec.go:325 0x355dd8 f9400361 MOVD (R27), R1
framedec.go:325 0x355ddc a97ffbfd LDP -8(RSP), (R29, R30)
framedec.go:325 0x355de0 910103ff ADD $64, RSP, RSP
framedec.go:325 0x355de4 d65f03c0 RET
framedec.go:330 0x355de8 d2800000 MOVD $0, R0
framedec.go:330 0x355dec d2800001 MOVD $0, R1
framedec.go:330 0x355df0 a97ffbfd LDP -8(RSP), (R29, R30)
framedec.go:330 0x355df4 910103ff ADD $64, RSP, RSP
framedec.go:330 0x355df8 d65f03c0 RET
framedec.go:310 0x355dfc d2800000 MOVD $0, R0
framedec.go:310 0x355e00 d2800001 MOVD $0, R1
framedec.go:310 0x355e04 a97ffbfd LDP -8(RSP), (R29, R30)
framedec.go:310 0x355e08 910103ff ADD $64, RSP, RSP
framedec.go:310 0x355e0c d65f03c0 RET
framedec.go:306 0x355e10 aa0303e0 MOVD R3, R0
framedec.go:306 0x355e14 aa0403e1 MOVD R4, R1
framedec.go:306 0x355e18 a97ffbfd LDP -8(RSP), (R29, R30)
framedec.go:306 0x355e1c 910103ff ADD $64, RSP, RSP
framedec.go:306 0x355e20 d65f03c0 RET
framedec.go:299 0x355e24 d2800000 MOVD $0, R0
framedec.go:299 0x355e28 d2800001 MOVD $0, R1
framedec.go:299 0x355e2c a97ffbfd LDP -8(RSP), (R29, R30)
framedec.go:299 0x355e30 910103ff ADD $64, RSP, RSP
framedec.go:299 0x355e34 d65f03c0 RET
framedec.go:297 0x355e38 f90007e0 MOVD R0, 8(RSP)
framedec.go:297 0x355e3c aa1e03e3 MOVD R30, R3
framedec.go:297 0x355e40 97f49c70 CALL runtime.morestack_noctxt.abi0(SB)
framedec.go:297 0x355e44 f94007e0 MOVD 8(RSP), R0
framedec.go:297 0x355e48 17ffffbe JMP github.com/klauspost/compress/zstd.(*frameDec).checkCRC(SB)
framedec.go:297 0x355e4c 00000000 ?
btw as is implied, that's github.com/klauspost/compress v1.15.9
for versioning/repro purposes
diff --git a/src/runtime/asm_arm64.s b/src/runtime/asm_arm64.s
index 7836ba1d96..ec29cd2038 100644
--- a/src/runtime/asm_arm64.s
+++ b/src/runtime/asm_arm64.s
@@ -321,6 +321,7 @@ TEXT runtime·morestack(SB),NOSPLIT|NOFRAME,$0-0
TEXT runtime·morestack_noctxt(SB),NOSPLIT|NOFRAME,$0-0
MOVW $0, R26
+ MOVD RSP, RSP // force SPWITE
B runtime·morestack(SB)
// spillArgs stores return values from registers to a *internal/abi.RegArgs in R20.
@lizthegrey Could you try if this patch makes any difference? Thanks!
I think I made the hourly cutoff for our build train, will report if I see any crashes this afternoon or over the weekend; if no crashes by Monday, then we can assume that was the fix.
Unfortunately I do see a crash in the shepherd
binary (same as from tracebacks 1-3), same PC -- 0x7cdc0
. didn't manage to get to the pod before Kubernetes GCed it though, and for some reason log shipping is broken in that environment.
It'll likely happen again over weekend though, if it happened within 3 hours of deploy of the putative fix
@lizthegrey have you seen more failures and do you have any logs to share? Could you doublecheck that the binary includes the patch? Thanks!
@lizthegrey have you seen more failures and do you have any logs to share? Could you doublecheck that the binary includes the patch? Thanks!
No failures over weekend. Yes, 100% sure the crash from Fri includes the patch, and I can manually scrape the logs from another system, it'll just take some time and thus hoping it happens somewhere else where I can grab the logs automatically instead.
https://share.getcloudapp.com/p9uQpJww -- no crashes today either. the crash on Friday must have been a fluke, but at least we have the telemetry to detect it if it does happen over the next few weeks. I'll still work on reconstructing the logs from the Friday crash immediately post-deploy though.
share.getcloudapp.com/p9uQpJww -- no crashes today either. the crash on Friday must have been a fluke, but at least we have the telemetry to detect it if it does happen over the next few weeks. I'll still work on reconstructing the logs from the Friday crash immediately post-deploy though.
Found the problem -- the pod was a lingering holdover of the previous build, despite running chronologically long after the new release should have gone. It did not have the patch. You are good to propose the MOVD RSP, RSP
patch for 1.20 release and for 1.19 and for 1.18 backport trains.
thank you thank you thank you @cherrymui and @prattmic for your patience.
Thanks @lizthegrey ! This is great. I'll send a CL soon. And thank you very much for providing the debug logs and helping us find the issue.
Change https://go.dev/cl/425396 mentions this issue: runtime: mark morestack_noctxt SPWRITE on LR architectures
@gopherbot please backport this to previous releases. This may cause a runtime crash. Thanks.
Backport issue(s) opened: #54674 (for 1.18), #54675 (for 1.19).
Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://go.dev/wiki/MinorReleases.
Change https://go.dev/cl/425615 mentions this issue: [release-branch.go1.19] runtime: mark morestack_noctxt SPWRITE on LR architectures
Change https://go.dev/cl/425616 mentions this issue: [release-branch.go1.18] runtime: mark morestack_noctxt SPWRITE on LR architectures