sgx-lkl icon indicating copy to clipboard operation
sgx-lkl copied to clipboard

Segfault when using pthread_cancel

Open SeanTAllen opened this issue 4 years ago • 4 comments

When running the following test:

/*
 * pthread_cancel-test.c
 *
 * This simple test checks that thread creation and cancelling.
 *
 */

#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <unistd.h>

#define RUNS 10000

void* thread_worker(void* arg)
{
    while(1) {
        printf("i'm in a loop\n");
        sleep(1);
    }
}

int main(void)
{
    int i;
    pthread_t thread1;
    int ret;

    for (i = 0; i < RUNS; i++)
    {
        printf("Creating worker thread (run=%d)\n", i);
        ret = pthread_create(&thread1, NULL, thread_worker, NULL);
        printf("Created worker thread (run=%d)\n", i);

        if (ret != 0)
        {
            printf("Failed to create thread (ret=%i)\n", ret);
            printf("TEST FAILED\n");
            exit(-1);
        }

        sleep(1);
        printf("Cancelling worker thread (run=%d)\n", i);
        pthread_cancel(thread1);
        printf("Cancelled worker thread (run=%d)\n", i);

    }

    if (i == RUNS)
    {
        printf("TEST PASSED (pthread_join) runs=%i\n", i);
    }
    else
    {
        printf("Wrong number of runs\n");
        printf("TEST FAILED\n");
    }

    return 0;
}

Both myself and @vtikoo get a segfault fairly early on.

Backtrace follows:

Thread 6 "ENCLAVE" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fe03dadfb10 (LWP 17645)]
0x00007fe0000991e8 in prepare_signal (sig=33, p=0x7fe03fe70b00, force=false) at kernel/signal.c:895
895     {
(gdb) bt
#0  0x00007fe0000991e8 in prepare_signal (sig=33, p=0x7fe03fe70b00, force=false) at kernel/signal.c:895
#1  0x00007fe00009a2eb in __send_signal (force=<optimized out>, type=<optimized out>, t=<optimized out>, info=<optimized out>,
    sig=<optimized out>) at kernel/signal.c:1076
#2  send_signal (sig=33, info=0x7fe03dabf0d0, t=0x0, type=PIDTYPE_PID) at kernel/signal.c:1236
#3  0x00007fe00009b6cd in do_send_sig_info (sig=1072106240, info=0x21, p=0x7fe03dabf0d0, type=PIDTYPE_PID) at kernel/signal.c:1285
#4  0x00007fe00009b763 in do_send_specific (tgid=33, pid=<optimized out>, sig=1072106240, info=0x0) at kernel/signal.c:3772
#5  0x00007fe00009b816 in do_tkill (tgid=33, pid=0, sig=1034678480) at kernel/signal.c:3798
#6  0x00007fe00009c504 in __do_sys_tkill (sig=<optimized out>, pid=<optimized out>) at kernel/signal.c:3833
#7  __se_sys_tkill (pid=<optimized out>, sig=<optimized out>) at kernel/signal.c:3827
#8  0x00007fe03dabf180 in ?? ()
#9  0x00007fe00008b6cf in run_syscall (params=<optimized out>, no=<optimized out>) at arch/lkl/kernel/syscalls.c:44
#10 lkl_syscall (no=0, params=0x7fe03dabf0d0) at arch/lkl/kernel/syscalls.c:192

Given the backtrace, this might be connected to our various signal problems. However, that isn't known yet so I wanted to open this issue to keep track of this problem.


A slight variation on this is that with the following version, the application will eventually hang (for me, always after the creation of 255 threads):

/*
 * pthread_cancel-test.c
 *
 * This simple test checks that thread creation and cancelling.
 *
 */

#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <unistd.h>

#define RUNS 10000

void* thread_worker(void* arg)
{
    while(1) {
        printf("i'm in a loop\n");
        sleep(1);
    }
}

int main(void)
{
    int i;
    pthread_t thread1;
    int ret;

    for (i = 0; i < RUNS; i++)
    {
        printf("Creating worker thread (run=%d)\n", i);
        ret = pthread_create(&thread1, NULL, thread_worker, NULL);
        printf("Created worker thread (run=%d)\n", i);

        if (ret != 0)
        {
            printf("Failed to create thread (ret=%i)\n", ret);
            printf("TEST FAILED\n");
            exit(-1);
        }

        printf("Cancelling worker thread (run=%d)\n", i);
        pthread_cancel(thread1);
        printf("Cancelled worker thread (run=%d)\n", i);

    }

    if (i == RUNS)
    {
        printf("TEST PASSED (pthread_join) runs=%i\n", i);
    }
    else
    {
        printf("Wrong number of runs\n");
        printf("TEST FAILED\n");
    }

    return 0;
}

note the only difference from the first one is the lack of the sleep(1) call in main.

SeanTAllen avatar Aug 17 '20 19:08 SeanTAllen

This will be broken because we are not delivering signals to the correct thread (see https://github.com/lsds/sgx-lkl/issues/644).

prp avatar Aug 17 '20 19:08 prp

@vtikoo, I think the syscall_cp assembly was not yet ported over to LKL, which may be related here?

davidchisnall avatar Aug 18 '20 13:08 davidchisnall

@davidchisnall I tried adding a breakpoint at the entry of __syscall_cp.s, it doesn't look syscall_cp is called during this test. The tkill syscall mentioned in the stacktrace is most probably from cancel_handler - https://github.com/lsds/sgx-lkl-musl/blob/oe_port/src/thread/pthread_cancel.c#L67

vtikoo avatar Aug 18 '20 23:08 vtikoo

@KenGordon is working on fixing signal delivery to the correct thread.

davidchisnall avatar Aug 19 '20 07:08 davidchisnall