llvm-project icon indicating copy to clipboard operation
llvm-project copied to clipboard

__builtin_memset_inline emits call

Open dvyukov opened this issue 3 years ago • 7 comments

The docs say:

Guaranteed inlined memset void __builtin_memset_inline(void *dst, int value, size_t size); ... but also guarantees not to call any external functions.

In the following program it emits a call to memset:

void my_memset(char* dst, int c, unsigned long n) {
  for (; n >= BLOCK; n -= BLOCK, dst += BLOCK) {
    __builtin_memset_inline(dst, c, BLOCK);
#ifdef ASM
    asm volatile("" ::: "memory");
#endif
  }
}
int main() {}
$ clang -v
clang version 16.0.0 (https://github.com/llvm/llvm-project.git cb5d0b41baf2f137f377a8d03481d6a5574a31ec)

$ clang memset.c -O2 -DBLOCK=64 && objdump --disassemble=my_memset a.out

0000000000401130 <my_memset>:
  401130:	48 83 fa 40          	cmp    $0x40,%rdx
  401134:	72 09                	jb     40113f <my_memset+0xf>
  401136:	48 83 e2 c0          	and    $0xffffffffffffffc0,%rdx
  40113a:	e9 f1 fe ff ff       	jmp    401030 <memset@plt>
  40113f:	c3                   	ret    

Changing BLOCK size does not affect it. Only asm volatile("" ::: "memory") prevents the call.

@gchatelet @melver

dvyukov avatar Aug 02 '22 10:08 dvyukov

From what I can see in the dumps, Clang emits a loop with each iteration calling to llvm.memset.inline with a constant size. Each call that then gets expanded into a series of stores by the InstCombine pass. The "problem" is that the loop of direct stores is ultimately transformed into a single call to llvm.memset with a nonconstant size.

msebor avatar Aug 02 '22 16:08 msebor

Maybe if instcombine lowers an llvm.memset.inline, it should mark the function with an attribute suppressing the automatic formation of llvm.memset?

efriedma-quic avatar Aug 02 '22 17:08 efriedma-quic

That should work if suppressing it even for stores that didn't originate from memset.inline is okay (e.g., those from calls to other functions inlined into the caller). Is there already an existing attribute for that (I couldn't find one) or do you mean to suggest to add a new one (or a pair, one for memset.inline and another for memcpy.inline)?

msebor avatar Aug 02 '22 18:08 msebor

I think "no-builtin-memset" technically works, but not sure it's the cleanest way to express the semantics we want.

efriedma-quic avatar Aug 02 '22 18:08 efriedma-quic

Another bug with __builtin_memcpy/set_inline I hit: #57048

dvyukov avatar Aug 10 '22 12:08 dvyukov

We do use the -fno-builtin variants in llvm-libc to implement the various memory functions. That said, I agree that it is a bug. The various llvm.<function>.inline instructions should be handled differently in InstCombine.

gchatelet avatar Aug 17 '22 08:08 gchatelet

it should mark the function

arguably this should be an attribute of the generated store instructions, and we would disallow re-combining of the stores that are marked nobuiltin.

legrosbuffle avatar Aug 17 '22 08:08 legrosbuffle