Pthread infinite deadloop or segmentation fault after setting TLS variable in syscall hook.
Hi, My work depends on TLS variable when hooking the syscall hook, however when i try to modify tls variable insdie the hook_function, it triggers immediate stalls or segmentation fault when it scales to certain number of threads.
Here's the code.
#include <stdint.h>
#include <stdio.h>
#define __hidden __attribute__((visibility("hidden")))
typedef long (*syscall_fn_t)(long, long, long, long, long, long, long);
static __thread uint64_t defective = 0;
static syscall_fn_t next_sys_call = NULL;
static long hook_function(long a1, long a2, long a3, long a4, long a5, long a6,
long a7) {
if (!defective) {
defective = 1;
}
return next_sys_call(a1, a2, a3, a4, a5, a6, a7);
}
int __hook_init(long placeholder __attribute__((unused)),
void *sys_call_hook_ptr) {
printf("output from __hook_init: we can do some init work here\n");
next_sys_call = *((syscall_fn_t *)sys_call_hook_ptr);
*((syscall_fn_t *)sys_call_hook_ptr) = hook_function;
return 0;
}
#include <pthread.h>
#include <stdio.h>
#include <unistd.h>
#define THREAD_COUNT 32
#define SLEEPS 10
void *thread(void *arg) {
for(int i = 0; i < SLEEPS; i++) {
printf("Thread %d\n", *((int *)arg));
}
return NULL;
}
int main() {
static int tids[THREAD_COUNT];
static pthread_t threads[THREAD_COUNT];
for(int i = 0; i < THREAD_COUNT; i++){
tids[i] = i;
pthread_create(&threads[i], NULL, thread, &tids[i]);
}
for(int i = 0; i < THREAD_COUNT; i++){
pthread_join(threads[i], NULL);
}
return 0;
}
When hooking this pthread program, the program completely freezes or segmentation fault. I was wondering that it may be related to the way glibc handles TLS variables but i don't have a clue. Huge thanks if you could provide some clues for me to investigate. Right now, I'm reading the code inside the glibc.
Thank you for your message.
I could reproduce the issue mentioned in your message.
I found that, in my environment, I can circumvent the issue by adding a condition if (a1 != 204) { just before the code that accesses the TLS variable, as shown below. This workaround prevents the TLS variable from being accessed when the hook function is called for the sched_getaffinity system call whose system call number is 204.
#include <stdint.h>
#include <stdio.h>
#define __hidden __attribute__((visibility("hidden")))
typedef long (*syscall_fn_t)(long, long, long, long, long, long, long);
static __thread uint64_t defective = 0;
static syscall_fn_t next_sys_call = NULL;
static long hook_function(long a1, long a2, long a3, long a4, long a5, long a6,
long a7) {
if (a1 != 204) {
if (!defective) {
defective = 1;
}
}
return next_sys_call(a1, a2, a3, a4, a5, a6, a7);
}
int __hook_init(long placeholder __attribute__((unused)),
void *sys_call_hook_ptr) {
printf("output from __hook_init: we can do some init work here\n");
next_sys_call = *((syscall_fn_t *)sys_call_hook_ptr);
*((syscall_fn_t *)sys_call_hook_ptr) = hook_function;
return 0;
}
My guess is that the access to the TLS variable leads to a call of a function named __tls_get_addr, and it goes through __tls_get_addr, malloc, and __get_nprocs_sched; __get_nprocs_sched tries to trigger the sched_getaffinity system call, however, this attempt is hooked and redirected to our hook function, and then, our hook function tries to access the TLS variable again and thus calls __tls_get_addr that will bring the execution back to our hook function through the attempt for triggering the sched_getaffinity system call.
I think the loop point, sched_getaffinity, might be specific to my environment. Please consider checking other system calls if the code above does not work in your environment.
I hope this information is helpful to you.