BOLT
BOLT copied to clipboard
BOLT/LLVM? does not preserve prefixes on conditional branches
Discussing with @maksfb it looks like it is similar to issue https://reviews.llvm.org/D120592
I have an input binary of the form
0000000000401169 <main>:
401169: 89 f8 mov %edi,%eax
40116b: 83 ff 01 cmp $0x1,%edi
40116e: 2e 74 06 je,pn 401177 <main+0xe>
401171: 83 ff 02 cmp $0x2,%edi
401174: 2e 75 01 jne,pn 401178 <main+0xf>
401177: c3 retq
401178: 83 ff 03 cmp $0x3,%edi
40117b: 2e 74 f9 je,pn 401177 <main+0xe>
40117e: b8 04 00 00 00 mov $0x4,%eax
401183: eb f2 jmp 401177 <main+0xe>
401185: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
40118c: 00 00 00
It has 3 conditional branches with a prefix of 2e. This is the output after BOLTing. The 3 conditional branches don't have the prefixes anymore.
0000000000401169 <main>:
401169: 89 f8 mov %edi,%eax
40116b: 83 ff 01 cmp $0x1,%edi
40116e: 74 05 je 401175 <main+0xc>
401170: 83 ff 02 cmp $0x2,%edi
401173: 75 01 jne 401176 <main+0xd>
401175: c3 retq
401176: 83 ff 03 cmp $0x3,%edi
401179: 74 fa je 401175 <main+0xc>
40117b: b8 04 00 00 00 mov $0x4,%eax
401180: eb f3 jmp 401175 <main+0xc>
401182: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
401189: 00 00 00
40118c: 0f 1f 40 00 nopl 0x0(%rax)
This looks like an underlying LLVM problem and not a BOLT problem? I noticed the llvm-objdump does not seem to know about the prefixes, compared to the regular objdump. @maksfb @rafaelauler @aaupov
0000000000401169 <main>:
401169: 89 f8 movl %edi, %eax
40116b: 83 ff 01 cmpl $1, %edi
40116e: 2e 74 06 je 0x401177 <main+0xe>
401171: 83 ff 02 cmpl $2, %edi
401174: 2e 75 01 jne 0x401178 <main+0xf>
401177: c3 retq
401178: 83 ff 03 cmpl $3, %edi
40117b: 2e 74 f9 je 0x401177 <main+0xe>
40117e: b8 04 00 00 00 movl $4, %eax
401183: eb f2 jmp 0x401177 <main+0xe>
401185: 66 2e 0f 1f 84 00 00 00 00 00 nopw %cs:(%rax,%rax)
40118f: 90 nop
Thanks
--Suresh
Yes, it looks similar to losing addr32 prefix: LLVM MC might be losing the prefix during disassembly, not setting it on MCInst.
On the other hand, these prediction prefixes are optional so we may want to strip them by default. What's the use case or perf effect here?
Thanks @aaupov
This is ignored by the current processors. The static prediction is NT (Not Taken) and the prefix is ignored. They are also mostly not generated (except when using special compiler flags and likely/unlikely macros). So striping them by default would be right.
With or without the hint, the BPU is updated when the branch is taken.
We are doing some early research work to mark common conditional branches with 3E (so the branch instruction is predicted Taken by the CPU). We wanted to use LBR or conditional branch taken to collect profile and then use BOLT to apply the hint.
For this the following we will need the following
- The branch probability of the conditional branch (is there an easy way in BOLT to get to this or do we need to process the profile data to create this?)
- To add the 3E prefix for conditional branches that are mostly taken ( I presume that adding the prefix will require some of the LLVM support?)
- Write out the binary and preserve the 3E (looks like this is missing some LLVM support)
Appreciate any directions on this.
Thanks
--Suresh