rules_go icon indicating copy to clipboard operation
rules_go copied to clipboard

Question: why go_test fails for ENOMEM

Open dentiny opened this issue 3 years ago • 0 comments

What version of rules_go are you using?

V0.29.0

What version of gazelle are you using?

V0.24.0

What version of Bazel are you using?

V5.0.0

Does this issue reproduce with the latest releases of all the above?

I'm not sure.

What operating system and processor architecture are you using?

Ubuntu 20.04.3 LTS x86_64

Any other potentially useful information about your toolchain?

What did you do?

I tried to debug an ENOMEM error which only appears when race detector enabled.

What did you expect to see?

Either the testcases succeeds, or it fails with stacktrace, so that I could debug how actually does memory allocation.

What did you see instead?

==12==ERROR: ThreadSanitizer failed to allocate 0x2b59000 (45453312) bytes at address 175df95610000 (errno: 12)

I understand golang race detector could use 5x, or even 10x, memory than normal.

But according to the error message above, it's only consuming 45MB, which doesn't seem to be a large chunk of memory.

I tried to debug with gdb and set breakpoint. But two things I don't understand:

h@h-desktop$ nm bazel-out/k8-dbg-ST-61abbe76aa1f/go_default_test_/go_default_test
nm: bazel-out/k8-dbg-ST-61abbe76aa1f/go_default_test_/go_default_test: no symbols

First question, I compiled go-test with bazel build -c dbg, it seems symbol table has been stripped. Also tried to add flag

gc_goopts = [
        "-N", # no optimization
        "-l", # no inline
    ],

These two flags works when I'm directly compiling with go build, but still no symbol table found.

Second question, I tried to set breakpoint at sbrk, where memory allocation goes to.

(gdb) bt
#0  __GI___sbrk (increment=135168) at sbrk.c:32
#1  0x00007ffff7a2abbd in __GI___default_morecore (increment=<optimized out>) at morecore.c:47
#2  0x00007ffff7a255a5 in sysmalloc (nb=nb@entry=656, av=av@entry=0x7ffff7b7ab80 <main_arena>) at malloc.c:2470
#3  0x00007ffff7a267c3 in _int_malloc (av=av@entry=0x7ffff7b7ab80 <main_arena>, bytes=bytes@entry=640) at malloc.c:4141
#4  0x00007ffff7a269ab in tcache_init () at malloc.c:2982
#5  0x00007ffff7a27c3e in tcache_init () at malloc.c:3044
#6  __GI___libc_malloc (bytes=64) at malloc.c:3044
#7  malloc_hook_ini (sz=64, caller=<optimized out>) at hooks.c:32
#8  0x0000555556501eb9 in operator new(unsigned long) ()
#9  0x000055555649ea9f in google::protobuf::internal::OnShutdownRun(void (*)(void const*), void const*) ()
#10 0x000055555648f9aa in google::protobuf::internal::InitProtobufDefaultsSlow() ()
#11 0x0000555556447120 in ?? ()
#12 0x00005555565b9b6d in __libc_csu_init ()
#13 0x00007ffff79b2040 in __libc_start_main (main=0x555555a73480 <main>, argc=1, argv=0x7fffffffdf08, init=0x5555565b9b20 <__libc_csu_init>, 
    fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffdef8) at ../csu/libc-start.c:264
#14 0x0000555555a0a7ae in _start ()

The stacktrace seems weird to me, since I don't think my code involves google protobuf. The uncertainty comes from ??, which is lost due to stripped symbols.

dentiny avatar Mar 08 '22 21:03 dentiny