BOLT
BOLT copied to clipboard
[C++] Use of tcmalloc leads to assertion failure
Hi, I have a binary which use tcmalloc (from gperftools). tcmalloc creates some new sections for himself (google_malloc and malloc_hook) with AX flags.
When I apply BOLT on this binary, it leads to an assertion when meets the __stop_google_malloc symbol on
llvm::bolt::RewriteInstance::discoverFileObjects(): Assertion `Section && "section for functions must be registered."' failed.
If I force tcmalloc not to create this section, I don't have the assertion anymore.
I assume this have some links with #6, but as the issue moved a lot from the original subject, I preferred open a new one.
Here is an extract of a readelf -e of libtcmalloc.a around the concerned sections:
[...]
[481] .gnu.lto_.symtab. PROGBITS 0000000000000000 000560d6
0000000000000d72 0000000000000000 E 0 0 1
[482] .gnu.lto_.opts PROGBITS 0000000000000000 00056e48
00000000000000bd 0000000000000000 E 0 0 1
[483] .rodata PROGBITS 0000000000000000 00056f20
00000000000008a9 0000000000000000 A 0 0 32
[484] malloc_hook PROGBITS 0000000000000000 000577ca
0000000000000211 0000000000000000 AX 0 0 2
[485] .relamalloc_hook RELA 0000000000000000 000a1010
0000000000000228 0000000000000018 I 802 484 8
[486] .gcc_except_table PROGBITS 0000000000000000 000579db
000000000000003b 0000000000000000 A 0 0 1
[487] .text._ZN4base6su PROGBITS 0000000000000000 00057a16
0000000000000024 0000000000000000 AXG 0 0 1
[488] .text._ZN4base6su PROGBITS 0000000000000000 00057a3a
000000000000001c 0000000000000000 AXG 0 0 1
[489] .text._ZN4base6su PROGBITS 0000000000000000 00057a56
0000000000000023 0000000000000000 AXG 0 0 1
[490] .rela.text._ZN4ba RELA 0000000000000000 000a1238
0000000000000018 0000000000000018 IG 802 489 8
[491] .text._ZN4base6su PROGBITS 0000000000000000 00057a79
000000000000002f 0000000000000000 AXG 0 0 1
[492] .rela.text._ZN4ba RELA 0000000000000000 000a1250
0000000000000018 0000000000000018 IG 802 491 8
[493] .text._ZN4base6su PROGBITS 0000000000000000 00057aa8
0000000000000011 0000000000000000 AXG 0 0 1
[...]
[991] .text._ZN8tcmallo PROGBITS 0000000000000000 000d26a2
000000000000002f 0000000000000000 AXG 0 0 2
[992] .rela.text._ZN8tc RELA 0000000000000000 0014c168
0000000000000018 0000000000000018 IG 1513 991 8
[993] .text._ZN8tcmallo PROGBITS 0000000000000000 000d26d2
000000000000002c 0000000000000000 AXG 0 0 2
[994] .rela.text._ZN8tc RELA 0000000000000000 0014c180
0000000000000018 0000000000000018 IG 1513 993 8
[995] google_malloc PROGBITS 0000000000000000 000d2700
00000000000043f5 0000000000000000 AX 0 0 64
[996] .relagoogle_mallo RELA 0000000000000000 0014c198
0000000000002088 0000000000000018 I 1513 995 8
[997] .gcc_except_table PROGBITS 0000000000000000 000d6af8
000000000000014c 0000000000000000 A 0 0 4
[998] .rela.gcc_except_ RELA 0000000000000000 0014e220
0000000000000018 0000000000000018 I 1513 997 8
[999] .text._ZN13TCMall PROGBITS 0000000000000000 000d6c44
000000000000001f 0000000000000000 AXG 0 0 2
[1000] .rela.text._ZN13T RELA 0000000000000000 0014e238
0000000000000018 0000000000000018 IG 1513 999 8
[...]
and is an extract of the final binary:
[...]
[13] .init PROGBITS 00000000004314f8 000314f8
000000000000001f 0000000000000000 AX 0 0 4
[14] .plt PROGBITS 0000000000431520 00031520
0000000000000810 0000000000000010 AX 0 0 16
[15] .text PROGBITS 0000000000432000 00032000
00000000018c80f0 0000000000000000 AX 0 0 4096
[16] google_malloc PROGBITS 0000000001cfa100 018fa100
00000000000043f5 0000000000000000 AX 0 0 64
[17] malloc_hook PROGBITS 0000000001cfe4f6 018fe4f6
00000000000005a1 0000000000000000 AX 0 0 2
[18] .fini PROGBITS 0000000001cfea98 018fea98
0000000000000009 0000000000000000 AX 0 0 4
[19] .rodata PROGBITS 0000000001cfeac0 018feac0
0000000000466e0b 0000000000000000 A 0 0 64
[20] .gcc_except_table PROGBITS 00000000021658cc 01d658cc
00000000000ed9f6 0000000000000000 A 0 0 4
[21] .eh_frame X86_64_UNWIND 00000000022532c8 01e532c8
0000000000603fac 0000000000000000 A 0 0 8
[22] .eh_frame_hdr X86_64_UNWIND 0000000002857274 02457274
00000000001619cc 0000000000000000 A 0 0 4
[...]
Here is some update about this issue:
The problem come along with the symbols ld generates when use a custom section (__start_<section_name>
and __stop_<section_name>
) and to be more precise the emplacement of the __stop_
.
Indeed this symbol is placed at the end address of the section, with a size of 0, and the BinaryContext::getSectionForAddress
function doesn't detect the symbols placed at the end_address of sections.
If the section is contiguous with another, the __stop_
symbol is detected on the wrong section, and BOLT still process (possibly dangerous?), but if there is a space between 2 sections, then BOLT is aborting because it detects this symbol as an orphan.
Here is a simple program to compile to be able to reproduce this issue:
#include <iostream>
extern char __start_custom;
extern char __stop_custom;
void po() __attribute__((section("custom")));
void po()
{
std::cout << "in po()" << std::endl;
}
int main(int argc, char *argv[])
{
po();
std::cout << "Begin of section:" << reinterpret_cast<void *>(&__start_custom) << ", end of section:" << reinterpret_cast<void *>(&__stop_custom) << std::endl;
return 0;
}
I join a linker script (ld) modified version which changes the .text section alignment and place the custom section just above it to help reproducing the issue (-T at compilation with g++).
If my problem is quite specific (but linked to a standard feature), this can happen it various cases, and not only on custom sections.
As example, g++ 8.2.1 on RHEL8-beta is generating by default some symbols for annobin, and for instance the symbol .annobin_elf_init.c._end
seems to be placed at the end of the .text section and with a size of 0.
A workaround would be to allow the getSectionForAddress to accept symbols at the end_address of a section (as #19 suggested). After re adapting the code, BOLT is working, and the symbol is in the ended section only is there is no contiguous section.
But actually I'm not sure it's enough to avoid problems:
As I said, if there is a contiguous section, a symbol like __stop_*
is detected on the wrong section, and as getSectionFromAddress
is only taking an address as parameter, we can't really determinate if this symbol is on the up_bound of the previous section or the low_bound of the next one.
Is BOLT doing some changes on section size and position? Even if it doesn't, is it possible this mis-traduction could lead to execution problems?
For what I saw, the symbols's section detection is based on this function and not on st_shndx
, is there a specific reason for that?
Last thing, after done the workaround I talked about, I saw some of these empty symbols get now a real size:
$ eu-nm ./libtcmalloc.so | grep _google_malloc
__start_google_malloc |000000000004a0c0|GLOBAL|NOTYPE |0000000000000000| |google_malloc
__stop_google_malloc |000000000004e99e|GLOBAL|NOTYPE |0000000000000000| |google_malloc
$ eu-nm ./libtcmalloc.so.bolt | grep _google_malloc
__start_google_malloc |000000000004a0c0|GLOBAL|NOTYPE |00000000000003ce| |google_malloc
__stop_google_malloc |000000000004e99e|GLOBAL|NOTYPE |00000000000000cf| |malloc_hook
Seems strange to me but I don't know if there is a justification for it.
Thanks for taking time to investigate the issue with tcmalloc. We can add a special handling for __start_*
/__stop_*
symbols and use st_shndx
to better track other symbols placed at the end of a section if no section could be found. However, I suspect that could be just a tip of the iceberg. There's a good chance that those marked section boundaries are being used for some sort of code discovery. As BOLT re-arranges code and shuffles functions, the assumptions e.g. about continuity of the code between __start_*
and __stop_*
will be broken. A better option is to preserve the code between such boundaries and to not optimize it.
The reason you are seeing sizes being assigned to __start_google_malloc
and __stop_google_malloc
is that currently they are treated by BOLT as entries to functions, and on the output they are being assigned sizes corresponding to those functions.
Ok, thanks for these explanations. I agree with you that it's safer to let those sections as is.
I can work on a fix to ignore symols between __start_*
and __stop_*
on my side if you're ok with that?
Sounds good to me.