__builtin_memset_inline emits call
The docs say:
Guaranteed inlined memset void __builtin_memset_inline(void *dst, int value, size_t size); ... but also guarantees not to call any external functions.
In the following program it emits a call to memset:
void my_memset(char* dst, int c, unsigned long n) {
for (; n >= BLOCK; n -= BLOCK, dst += BLOCK) {
__builtin_memset_inline(dst, c, BLOCK);
#ifdef ASM
asm volatile("" ::: "memory");
#endif
}
}
int main() {}
$ clang -v
clang version 16.0.0 (https://github.com/llvm/llvm-project.git cb5d0b41baf2f137f377a8d03481d6a5574a31ec)
$ clang memset.c -O2 -DBLOCK=64 && objdump --disassemble=my_memset a.out
0000000000401130 <my_memset>:
401130: 48 83 fa 40 cmp $0x40,%rdx
401134: 72 09 jb 40113f <my_memset+0xf>
401136: 48 83 e2 c0 and $0xffffffffffffffc0,%rdx
40113a: e9 f1 fe ff ff jmp 401030 <memset@plt>
40113f: c3 ret
Changing BLOCK size does not affect it.
Only asm volatile("" ::: "memory") prevents the call.
@gchatelet @melver
From what I can see in the dumps, Clang emits a loop with each iteration calling to llvm.memset.inline with a constant size. Each call that then gets expanded into a series of stores by the InstCombine pass. The "problem" is that the loop of direct stores is ultimately transformed into a single call to llvm.memset with a nonconstant size.
Maybe if instcombine lowers an llvm.memset.inline, it should mark the function with an attribute suppressing the automatic formation of llvm.memset?
That should work if suppressing it even for stores that didn't originate from memset.inline is okay (e.g., those from calls to other functions inlined into the caller). Is there already an existing attribute for that (I couldn't find one) or do you mean to suggest to add a new one (or a pair, one for memset.inline and another for memcpy.inline)?
I think "no-builtin-memset" technically works, but not sure it's the cleanest way to express the semantics we want.
Another bug with __builtin_memcpy/set_inline I hit: #57048
We do use the -fno-builtin variants in llvm-libc to implement the various memory functions.
That said, I agree that it is a bug. The various llvm.<function>.inline instructions should be handled differently in InstCombine.
it should mark the function
arguably this should be an attribute of the generated store instructions, and we would disallow re-combining of the stores that are marked nobuiltin.