sonic icon indicating copy to clipboard operation
sonic copied to clipboard

Panic: invalid pc-encoded table

Open OlegElizarov opened this issue 2 years ago • 11 comments

Hello guys. We have a problem in our service and sonic appeared in trace. Unfortunately it happens only in production environment so I don't know how to reproduce this behavior. Go version in our project - 1.18 Sonic version - 1.5.0 Can you help us to analyze the problem? If you need more information - just say it. If it's not because of sonic than sorry. You project is really amazing and we love it! ❤️

Stack trace:

runtime: invalid pc-encoded table f=google.golang.org/protobuf/internal/impl.(*MessageInfo).checkInitialized-fm pc=0x7d78cf targetpc=0x7d78d3 tab=[0/0]0x0
	value=0 until pc=0x7d786a
	value=32 until pc=0x7d78a2
	value=0 until pc=0x7d78be
	value=32 until pc=0x7d78cf
fatal error: invalid runtime symbol table

goroutine 0 [idle]:
runtime: unexpected return pc for runtime.sigreturn called from 0x7
stack: frame={sp:0xc00010b380, fp:0xc00010b388} stack=[0xc000104000,0xc00010c000)
0x000000c00010b280:  0x000000c00010b2f8  0x000000000044dd05 <runtime.sigtrampgo+0x00000000000001a5> 
0x000000c00010b290:  0x000000c00000001b  0x000000c000100000 
0x000000c00010b2a0:  0x000000c00010b2b8  0x000000c156c09ba0 
0x000000c00010b2b0:  0x0000000000000000  0x0000000000000000 
0x000000c00010b2c0:  0x0000000000000000  0x0000000000000000 
0x000000c00010b2d0:  0x0000000000000000  0x0000000000000000 
0x000000c00010b2e0:  0x000000c04a7b5a00  0x000000c00010b4b0 
0x000000c00010b2f0:  0x000000c00010b380  0x000000c00010b320 
0x000000c00010b300:  0x000000000046e7ee <runtime.sigtrampgo+0x000000000000002e>  0x000000000000001b 
0x000000c00010b310:  0x000000c00010b4b0  0x000000c00010b380 
0x000000c00010b320:  0x000000c00010b370  0x000000000046dcfd <runtime.sigtramp+0x000000000000003d> 
0x000000c00010b330:  0x000000000000001b  0x000000c00010b4b0 
0x000000c00010b340:  0x000000c00010b380  0x000000c1d13948db 
0x000000c00010b350:  0x0000000000000000  0x0000000000000000 
0x000000c00010b360:  0x00000000000000ef  0x000000c00213b6d8 
0x000000c00010b370:  0x000000c00213b658  0x000000000046de00 <runtime.sigreturn+0x0000000000000000> 
0x000000c00010b380: <0x0000000000000007 >0x0000000000000000 
0x000000c00010b390:  0x000000c000104000  0x0000000000000000 
0x000000c00010b3a0:  0x0000000000008000  0x00000000011a7150 <github.com/bytedance/sonic/internal/native/avx2.__native_entry__+0x000000000000e290> 
0x000000c00010b3b0:  0x00000000011a9150 <github.com/bytedance/sonic/internal/native/avx2.__native_entry__+0x0000000000010290>  0x00000000000ff77d 
0x000000c00010b3c0:  0x000000c00213b6e8  0x0000000000000000 
0x000000c00010b3d0:  0x0000000000000000  0x00000000000000ef 
0x000000c00010b3e0:  0x000000c00213b6d8  0x000000c169519d5b 
0x000000c00010b3f0:  0x00000000000000a4  0x000000c00213b658 
0x000000c00010b400:  0x000000c1d13948db  0x000000c1d1394883 
0x000000c00010b410:  0x0000000000000076  0x00000000000000a4 
0x000000c00010b420:  0x000000c00213b618  0x000000000119afdf <github.com/bytedance/sonic/internal/native/avx2.__native_entry__+0x000000000000211f> 
0x000000c00010b430:  0x0000000000000206  0x002b000000000033 
0x000000c00010b440:  0x0000000000000000  0x0000000000000000 
0x000000c00010b450:  0x0000000000000000  0x0000000000000000 
0x000000c00010b460:  0x000000c00010b540  0x0000000000000000 
0x000000c00010b470:  0x0000000000000000  0x0000000000000000 
0x000000c00010b480:  0x0000000000000000 
runtime.throw({0x1b16e2e?, 0x0?})
	/usr/local/go/src/runtime/panic.go:992 +0x71
runtime.pcvalue({0x283a320?, 0x2ba9320?}, 0x9da15, 0x7d78d3, 0xc00010ad68, 0x1)
	/usr/local/go/src/runtime/symtab.go:963 +0x57a
runtime.funcspdelta({0x283a320?, 0x2ba9320?}, 0x0?, 0x0?)
	/usr/local/go/src/runtime/symtab.go:1038 +0x34
runtime.gentraceback(0x0?, 0xc00010afd0?, 0x3a?, 0x40?, 0x0, 0xc00010afd0, 0x40, 0x0, 0x0?, 0x6)
	/usr/local/go/src/runtime/traceback.go:191 +0x67c
runtime.sigprof(0x119afdf, 0xc00010b270?, 0x0?, 0xc04a7b5a00, 0xc000100000)
	/usr/local/go/src/runtime/proc.go:4507 +0x108
runtime.sighandler(0x1b?, 0xc000100000?, 0xc00010b2b8?, 0xc156c09ba0?)
	/usr/local/go/src/runtime/signal_unix.go:613 +0x5e6
runtime.sigtrampgo(0x1b, 0xc00010b4b0, 0xc00010b380)
	/usr/local/go/src/runtime/signal_unix.go:477 +0x1a5
runtime.sigtrampgo(0x1b, 0xc00010b4b0, 0xc00010b380)
	<autogenerated>:1 +0x2e
runtime.sigtramp()
	/usr/local/go/src/runtime/sys_linux_amd64.s:361 +0x3d
runtime: unexpected return pc for runtime.sigreturn called from 0x7
stack: frame={sp:0xc00010b380, fp:0xc00010b388} stack=[0xc000104000,0xc00010c000)
0x000000c00010b280:  0x000000c00010b2f8  0x000000000044dd05 <runtime.sigtrampgo+0x00000000000001a5> 
0x000000c00010b290:  0x000000c00000001b  0x000000c000100000 
0x000000c00010b2a0:  0x000000c00010b2b8  0x000000c156c09ba0 
0x000000c00010b2b0:  0x0000000000000000  0x0000000000000000 
0x000000c00010b2c0:  0x0000000000000000  0x0000000000000000 
0x000000c00010b2d0:  0x0000000000000000  0x0000000000000000 
0x000000c00010b2e0:  0x000000c04a7b5a00  0x000000c00010b4b0 
0x000000c00010b2f0:  0x000000c00010b380  0x000000c00010b320 
0x000000c00010b300:  0x000000000046e7ee <runtime.sigtrampgo+0x000000000000002e>  0x000000000000001b 
0x000000c00010b310:  0x000000c00010b4b0  0x000000c00010b380 
0x000000c00010b320:  0x000000c00010b370  0x000000000046dcfd <runtime.sigtramp+0x000000000000003d> 
0x000000c00010b330:  0x000000000000001b  0x000000c00010b4b0 
0x000000c00010b340:  0x000000c00010b380  0x000000c1d13948db 
0x000000c00010b350:  0x0000000000000000  0x0000000000000000 
0x000000c00010b360:  0x00000000000000ef  0x000000c00213b6d8 
0x000000c00010b370:  0x000000c00213b658  0x000000000046de00 <runtime.sigreturn+0x0000000000000000> 
0x000000c00010b380: <0x0000000000000007 >0x0000000000000000 
0x000000c00010b390:  0x000000c000104000  0x0000000000000000 
0x000000c00010b3a0:  0x0000000000008000  0x00000000011a7150 <github.com/bytedance/sonic/internal/native/avx2.__native_entry__+0x000000000000e290> 
0x000000c00010b3b0:  0x00000000011a9150 <github.com/bytedance/sonic/internal/native/avx2.__native_entry__+0x0000000000010290>  0x00000000000ff77d 
0x000000c00010b3c0:  0x000000c00213b6e8  0x0000000000000000 
0x000000c00010b3d0:  0x0000000000000000  0x00000000000000ef 
0x000000c00010b3e0:  0x000000c00213b6d8  0x000000c169519d5b 
0x000000c00010b3f0:  0x00000000000000a4  0x000000c00213b658 
0x000000c00010b400:  0x000000c1d13948db  0x000000c1d1394883 
0x000000c00010b410:  0x0000000000000076  0x00000000000000a4 
0x000000c00010b420:  0x000000c00213b618  0x000000000119afdf <github.com/bytedance/sonic/internal/native/avx2.__native_entry__+0x000000000000211f> 
0x000000c00010b430:  0x0000000000000206  0x002b000000000033 
0x000000c00010b440:  0x0000000000000000  0x0000000000000000 
0x000000c00010b450:  0x0000000000000000  0x0000000000000000 
0x000000c00010b460:  0x000000c00010b540  0x0000000000000000 
0x000000c00010b470:  0x0000000000000000  0x0000000000000000 
0x000000c00010b480:  0x0000000000000000 
runtime.sigreturn()
	/usr/local/go/src/runtime/sys_linux_amd64.s:466

OlegElizarov avatar Sep 29 '22 14:09 OlegElizarov

Can you give me some usage codes? It seems being introduced by a corrupt memory pointer -- Is there any unsafe pointer in your decoded|encoded object?

AsterDY avatar Oct 13 '22 09:10 AsterDY

Sonic has widespread usage in our project. So it may be a little bit hard. But let's try.

First , unsafe in our project:

Only this one

// BytesToString ...
func BytesToString(item []byte) string {
	return *(*string)(unsafe.Pointer(&item))
}

About sonic usage, we are using it for marshaling big proto structures and also map[string]interface{} (some values are proto structures)

OlegElizarov avatar Oct 13 '22 11:10 OlegElizarov

Actually BytesToString is quite an unsafe behavior, maybe the item byte-slice is used somewhere concurrently, or has got freed by GC. You should make sure it is fully secure for sonic once you passed it

AsterDY avatar Oct 13 '22 12:10 AsterDY

Hello again 👋🏻 It still panics :(

So we dropped unsafe code from our project. The only unsafe that exists is unsafe in protobuf internals. Also I can add that this issues appeared in our other project.(unsafe is only in protobuf too)

So could you at least try to research this problem? I can add more stack traces if you need. Unfortunately, still have no idea how to reproduce it.

OlegElizarov avatar Oct 28 '22 15:10 OlegElizarov

Can u try to integrate your practical codes as a UT and run go test -race -gcflags=-d=checkptr to check if there is any panic?

AsterDY avatar Oct 31 '22 04:10 AsterDY

@AsterDY It's SIGPROF again, as shown in the stack backtrace.

chenzhuoyu avatar Nov 07 '22 15:11 chenzhuoyu

@OlegElizarov Do you have pprof of some kind of profiler attached to your program? It’s a known issue that Sonic is not very compatible with SIGPROF. We got a few reports, and still working on this.

chenzhuoyu avatar Nov 07 '22 15:11 chenzhuoyu

Hello again. Thank for your help guys.

@AsterDY , we tried to add (-race -gcflags=-d=checkptr ) but there was no problems with sonic in this mode. Now we are trying to add gotraceback=crash to get coredump but there is a problem that after pc-encoded table panic our pod restarts and we don't know how to save dump now. But we are researching it.

@chenzhuoyu , yep , pprof is always on in our service.(some kind of rule in our corporation ) Is it possible to do something ? (some workaround)

OlegElizarov avatar Nov 07 '22 17:11 OlegElizarov

Go pprof won't make sonic crash which we have tried... Maybe it's introduce by other kind of profiling

AsterDY avatar Nov 15 '22 09:11 AsterDY

At present, since Go ASM doesn't support generating pcsp pcdata for stack-unstable functions, the program will crash if the runtime do gentraceback() while sonic's native func is running. We are trying to resolve this problem but still need a lot of work to do.

AsterDY avatar Feb 09 '23 09:02 AsterDY

The root cause may be the output of asm2asm.py contains many pushq/popq instructions but these instructions are encoded via .byte/.word/.long directives. The Go assembler uses a trivial algorithm to compute PCSP. In my understanding, it does a linear scan of instructions which implicitly requires that all push/pop pairs must be in the same basic block.

  1. https://github.com/golang/go/blob/a4d5fbc3a48b63f19fcd2a4d040a85c75a2709b5/src/cmd/internal/obj/x86/obj6.go#L843
pushq // +8
if <>
  pop   // -8 
else 
  pop   // -8

Since it may be difficult to ask Golang to improve its assembler, a possible solution is to use attribute naked for C functions (perhaps by modifying emitted LLVM IR) and insert C prologue/epilogue manually without using push and pop in asm2asm.py.

  1. https://llvm.org/docs/LangRef.html#id1695

Note that this still cannot prevent LLVM from generating instructions that adjust stacks.

  1. https://github.com/bytedance/sonic/blob/f87d87de7ac6a5d01e3e39bfbdf0999473bfd6d9/internal/native/avx2/native_amd64.s#L7955

Another potential issue is the stack alignment. In amd64 linux, the stack alignment is 16 byte so that movaps can operate on stack memory operands without stack realignment.

  1. https://leria-info.univ-angers.fr/~jeanmichel.richer/assembly/doc/movaps.pdf
  2. https://gitlab.com/x86-psABIs/x86-64-ABI
  3. https://github.com/llvm/llvm-project/blob/e5906f64a63d064e6a9ea2c46ebc9c285ca02bd1/llvm/lib/Target/X86/X86Subtarget.cpp#L288

The weird thing is that Go aligns stack to 8 bytes on amd64.

  1. https://github.com/golang/go/blob/master/src/cmd/compile/abi-internal.md#stack-layout

Go realigns the stack when making call to c via cgo.

  1. https://github.com/golang/go/blob/a4d5fbc3a48b63f19fcd2a4d040a85c75a2709b5/src/runtime/asm_amd64.s#L797

Fortunately, there are only a few potential dangerous instructions in native_amd64.s, which are also guarded by stack realignment.

  1. https://github.com/bytedance/sonic/blob/f87d87de7ac6a5d01e3e39bfbdf0999473bfd6d9/internal/native/avx2/native_amd64.s#L8264
  2. https://github.com/bytedance/sonic/blob/f87d87de7ac6a5d01e3e39bfbdf0999473bfd6d9/internal/native/avx2/native_amd64.s#L7955

tianxiaogu avatar Feb 09 '23 22:02 tianxiaogu

fixed in #644

liuq19 avatar Jun 19 '24 15:06 liuq19