sgx-lkl icon indicating copy to clipboard operation
sgx-lkl copied to clipboard

Add SMP support to SGX-LKL

Open prp opened this issue 4 years ago • 0 comments

We should add the branch with experimental SMP support to the OE version of SGX-LKL after rebasing it on master: https://github.com/lsds/sgx-lkl/tree/smp-support

Future development should be done in a way that maintains compatibility with SMP support.

Here are some current performance results from the non-OE SGX-LKL SMP branch:

Ethreads pwrite (to page cache) pread (to page cache) pread (from disk)
Std SMP Std SMP Std SMP
1 2.64 3.42 (0.77x) 2.76 3.12 (0.88x) 6.50 11.46 (0.56x)
2 3.16 1.78 (1.77x) 3.17 1.62 (1.95x) 7.98 6.99 (1.33x)
3 3.09 1.42 (2.17x) 3.23 1.29 (2.50x) 7.01 6.72 (1.04x)
4 3.23 0.93 (3.47x) 3.29 0.84 (3.91x) 7.29 6.33 (1.15x)

As can be seen, syscalls do not scale when they go through the LKL virtio interface (found under lkl/tools/lib/), which was not implemented with parallelism in mind.

In addition, the current SMP branch exhibits non-deterministic segfaults. The segfaults can be triggered by running e.g. find:

~/github/sgx-lkl/apps/miniroot$ SGXLKL_KEY=../../build/config/enclave_debug.key SGXLKL_HEAP=200m SGXLKL_CMDLINE=mem=60m  SGXLKL_ETHREADS=4 ../../gdb/sgx-lkl-gdb --args ../../build/sgx-lkl-run ./sgxlkl-miniroot-fs.img /usr/bin/find . -name "*test*"

These are some of the observed segfaults:

Thread 6 "ENCLAVE" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffcaf0e700 (LWP 4697)]
0x00007fffbcc91de0 in __pthread_mutex_lock (m=0x0) at src/thread/pthread_mutex_lock.c:5
5        if ((m->_m_type&15) == PTHREAD_MUTEX_NORMAL
(gdb) bt
#0  0x00007fffbcc91de0 in __pthread_mutex_lock (m=0x0) at src/thread/pthread_mutex_lock.c:5
#1  0x00007fffbccb2cc4 in mutex_lock (mutex=0x0) at lkl/posix-host.c:186
#2  0x00007fffbc88893e in arch_cpu_idle () at arch/lkl/kernel/cpu.c:296
#3  0x00007fffbcbf24b3 in default_idle_call () at kernel/sched/idle.c:93
#4  0x00007fffbc8b0696 in cpuidle_idle_call () at kernel/sched/idle.c:153
#5  do_idle () at kernel/sched/idle.c:262
#6  0x00007fffbc8b0985 in cpu_startup_entry (state=-1122077904) at kernel/sched/idle.c:368
#7  0x00007fffbc888c62 in lkl_start_secondary (unused=<optimised out>) at arch/lkl/kernel/cpu.c:410
#8  0x00007fffbc88676e in thread_bootstrap (_tba=0x7fffbd1e7330 <cpus+48>) at arch/lkl/kernel/threads.c:179
#9  0x0000000000000000 in ?? ()
Thread 6 "ENCLAVE" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffcaf0e700 (LWP 4799)]
a_crash () at ./arch/x86_64/atomic_arch.h:108
108        __asm__ __volatile__( "hlt" : : : "memory" );
(gdb) bt
#0  a_crash () at ./arch/x86_64/atomic_arch.h:108
#1  0x00007fffbcc2298f in unmap_chunk (self=0x7fffbc7fff30) at src/malloc/malloc.c:510
#2  0x00007fffbcc229e0 in free (p=0x7fffbc7fff40) at src/malloc/malloc.c:521
#3  0x00007fffbccb2d40 in mutex_free (_mutex=0x7fffbc7fff40) at lkl/posix-host.c:197
#4  0x00007fffbc888182 in lkl_cpu_cleanup (shutdown=48) at arch/lkl/kernel/cpu.c:281
#5  0x00007fffbc888974 in arch_cpu_idle () at arch/lkl/kernel/cpu.c:301
#6  0x00007fffbcbf24b3 in default_idle_call () at kernel/sched/idle.c:93
#7  0x00007fffbc8b0696 in cpuidle_idle_call () at kernel/sched/idle.c:153
#8  do_idle () at kernel/sched/idle.c:262
#9  0x00007fffbc8b0985 in cpu_startup_entry (state=-1122077944) at kernel/sched/idle.c:368
#10 0x00007fffbc888c62 in lkl_start_secondary (unused=<optimised out>) at arch/lkl/kernel/cpu.c:410
#11 0x00007fffbc88676e in thread_bootstrap (_tba=0x7fffbd1e7308 <cpus+8>) at arch/lkl/kernel/threads.c:179
#12 0x0000000000000000 in ?? ()

This appears to be a double-free:

(gdb) f 1
#1  0x00007fffbcc2298f in unmap_chunk (self=0x7fffbc7fff30) at src/malloc/malloc.c:510
510        if (extra & 1) a_crash();
Thread 8 "ENCLAVE" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffc9f0c700 (LWP 4981)]
0x00007fffbcc91de0 in __pthread_mutex_lock (m=0x0) at src/thread/pthread_mutex_lock.c:5
5        if ((m->_m_type&15) == PTHREAD_MUTEX_NORMAL
(gdb) bt
#0  0x00007fffbcc91de0 in __pthread_mutex_lock (m=0x0) at src/thread/pthread_mutex_lock.c:5
#1  0x00007fffbccb2cc4 in mutex_lock (mutex=0x0) at lkl/posix-host.c:186
#2  0x00007fffbc88853a in __lkl_cpu_put (cpu_no=1) at arch/lkl/kernel/cpu.c:176
#3  0x00007fffbc888820 in lkl_cpu_put () at arch/lkl/kernel/cpu.c:225
#4  0x00007fffbc88723b in lkl_trigger_irq (cpu=<optimised out>, irq=-1122118520) at arch/lkl/kernel/irq.c:113
#5  0x00007fffb897bf60 in ?? ()
#6  0x0000000000000008 in ?? ()
#7  0x00007fffb897bfb0 in ?? ()
#8  0x00007fffbc8882fe in lkl_ipi_thread (arg=<optimised out>) at arch/lkl/kernel/cpu.c:524
Thread 7 "ENCLAVE" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffca70d700 (LWP 5064)]
0x00007fffbcc91de0 in __pthread_mutex_lock (m=0x0) at src/thread/pthread_mutex_lock.c:5
5        if ((m->_m_type&15) == PTHREAD_MUTEX_NORMAL
(gdb) bt
#0  0x00007fffbcc91de0 in __pthread_mutex_lock (m=0x0) at src/thread/pthread_mutex_lock.c:5
#1  0x00007fffbccb2cc4 in mutex_lock (mutex=0x0) at lkl/posix-host.c:186
#2  0x00007fffbc88893e in arch_cpu_idle () at arch/lkl/kernel/cpu.c:296
#3  0x00007fffbcbf24b3 in default_idle_call () at kernel/sched/idle.c:93
#4  0x00007fffbc8b0696 in cpuidle_idle_call () at kernel/sched/idle.c:153
#5  do_idle () at kernel/sched/idle.c:262
#6  0x00007fffbc8b0985 in cpu_startup_entry (state=-1122077952) at kernel/sched/idle.c:368
#7  0x00007fffbcbec8bd in rest_init () at init/main.c:442
#8  0x00007fffbcd2a0b0 in start_kernel () at init/main.c:738
#9  0x00007fffbcd2aa4f in lkl_run_kernel (arg=<optimised out>) at arch/lkl/kernel/setup.c:50
#10 0x00007fffbccb0b39 in _exec (lt_=0x7fffbc7fe2e0) at sched/lthread.c:170
#11 0x0000000000000000 in ?? ()
Thread 6 "ENCLAVE" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffcaf0e700 (LWP 5142)]
a_crash () at ./arch/x86_64/atomic_arch.h:108
108        __asm__ __volatile__( "hlt" : : : "memory" );
(gdb) bt
#0  a_crash () at ./arch/x86_64/atomic_arch.h:108
#1  0x00007fffbcc2298f in unmap_chunk (self=0x7fffbc7fe070) at src/malloc/malloc.c:510
#2  0x00007fffbcc229e0 in free (p=0x7fffbc7fe080) at src/malloc/malloc.c:521
#3  0x00007fffbccb2b07 in sem_free (sem=0x7fffbc7fe080) at lkl/posix-host.c:114
#4  0x00007fffbc88816b in lkl_cpu_cleanup (shutdown=112) at arch/lkl/kernel/cpu.c:277
#5  0x00007fffbc888974 in arch_cpu_idle () at arch/lkl/kernel/cpu.c:301
#6  0x00007fffbcbf24b3 in default_idle_call () at kernel/sched/idle.c:93
#7  0x00007fffbc8b0696 in cpuidle_idle_call () at kernel/sched/idle.c:153
#8  do_idle () at kernel/sched/idle.c:262
#9  0x00007fffbc8b0985 in cpu_startup_entry (state=-1122077896) at kernel/sched/idle.c:368
#10 0x00007fffbcbec8bd in rest_init () at init/main.c:442
#11 0x00007fffbcd2a0b0 in start_kernel () at init/main.c:738
#12 0x00007fffbcd2aa4f in lkl_run_kernel (arg=<optimised out>) at arch/lkl/kernel/setup.c:50
#13 0x00007fffbccb0b39 in _exec (lt_=0x7fffbc7fe2e0) at sched/lthread.c:170

Many of the segfaults happen within arch_cpu_idle.

prp avatar Jun 25 '20 09:06 prp