Multiple calls to __tls_get_addr in the same function
The following code calls __tls_get_addr multiple times instead of once:
int x;
int y;
void foo() {
x = 1;
y = 2;
}
This is the generated asm:
void example.foo():
push rax
data16
lea rdi, [rip + int example.x@TLSGD]
data16
data16
rex64
call __tls_get_addr@PLT
mov dword ptr [rax], 1
data16
lea rdi, [rip + int example.y@TLSGD]
data16
data16
rex64
call __tls_get_addr@PLT
mov dword ptr [rax], 2
pop rax
ret
https://godbolt.org/z/orx4eG7G1
Using -fthread-model=local-exec produces the same asm as clang by default:
void example.foo():
mov dword ptr fs:[int example.x@TPOFF], 1
mov dword ptr fs:[int example.y@TPOFF], 2
ret
No idea as to whether our default is sane, but TLS variables are way more common in D than in C++.
Clang emits the globals as dso_local in IR, something we don't (but https://github.com/ldc-developers/ldc/pull/3713 would probably have done).
If I'm reading the definitions correctly, I believe globals should be dso_local by default for dynamic libraries unless marked as weak - just like clang.
This should reduce the amount of calls to __tls_get_addr.
Using
-fthread-model=local-execproduces the same asm as clang by default:void example.foo(): mov dword ptr fs:[int example.x@TPOFF], 1 mov dword ptr fs:[int example.y@TPOFF], 2 retNo idea as to whether our default is sane, but TLS variables are way more common in D than in C++.
I'd have thought that the PIC flag would be a better gate for determining which should be the default (GDC does initial-exec without, and global-dynamic with -fPIC).
Memoization of TLS reads could spell out trouble though, and it has caused trouble with shared Fibers. Which having a quick look, both ldc and gdc do some level of mitigating against.
A further note, given that backend optimization passes such as CSE tend not to consider thread locals, it is better to err on the side of caution.