ldc icon indicating copy to clipboard operation
ldc copied to clipboard

Multiple calls to __tls_get_addr in the same function

Open rymrg opened this issue 4 years ago • 4 comments

The following code calls __tls_get_addr multiple times instead of once:

int x;
int y;
void foo() {
    x = 1;
    y = 2;
}

This is the generated asm:

void example.foo():
        push    rax
        data16
        lea     rdi, [rip + int example.x@TLSGD]
        data16
        data16
        rex64
        call    __tls_get_addr@PLT
        mov     dword ptr [rax], 1
        data16
        lea     rdi, [rip + int example.y@TLSGD]
        data16
        data16
        rex64
        call    __tls_get_addr@PLT
        mov     dword ptr [rax], 2
        pop     rax
        ret

https://godbolt.org/z/orx4eG7G1

rymrg avatar Jul 15 '21 20:07 rymrg

Using -fthread-model=local-exec produces the same asm as clang by default:

void example.foo():
        mov     dword ptr fs:[int example.x@TPOFF], 1
        mov     dword ptr fs:[int example.y@TPOFF], 2
        ret

No idea as to whether our default is sane, but TLS variables are way more common in D than in C++.

kinke avatar Jul 15 '21 22:07 kinke

Clang emits the globals as dso_local in IR, something we don't (but https://github.com/ldc-developers/ldc/pull/3713 would probably have done).

kinke avatar Jul 15 '21 22:07 kinke

If I'm reading the definitions correctly, I believe globals should be dso_local by default for dynamic libraries unless marked as weak - just like clang. This should reduce the amount of calls to __tls_get_addr.

rymrg avatar Jul 16 '21 08:07 rymrg

Using -fthread-model=local-exec produces the same asm as clang by default:

void example.foo():
        mov     dword ptr fs:[int example.x@TPOFF], 1
        mov     dword ptr fs:[int example.y@TPOFF], 2
        ret

No idea as to whether our default is sane, but TLS variables are way more common in D than in C++.

I'd have thought that the PIC flag would be a better gate for determining which should be the default (GDC does initial-exec without, and global-dynamic with -fPIC).

Memoization of TLS reads could spell out trouble though, and it has caused trouble with shared Fibers. Which having a quick look, both ldc and gdc do some level of mitigating against.

A further note, given that backend optimization passes such as CSE tend not to consider thread locals, it is better to err on the side of caution.

ibuclaw avatar Jul 16 '21 09:07 ibuclaw