qthreads icon indicating copy to clipboard operation
qthreads copied to clipboard

Performance comparison of Aarch64 context switch

Open cjhackillinois opened this issue 3 years ago • 10 comments

Do a timing test to see the performance difference between using Aarch64 native context switching assembly vs the ucontext.h library.

Process I follow:

  1. Make change to config/qthread_check_swapcontext.m4 file
  2. Clean, configure, make, and install library
  3. Check library using objdump -d <library file> | grep qt_get for appropriate functions based on using or not using library
  4. Compile stress test make task_spawn inside the tests/stress folder. The executable is placed inside the .libs folder.
  5. Do 3 timing tests
  6. Paste picture

cjhackillinois avatar Dec 13 '21 20:12 cjhackillinois

Branch: PRTestBranchDec2021

image

Inside the file: config/qthread_check_swapcontext.m4 the parameter qt_host_based_enable_fastcontext enables the code to select assembly if the values is set to 'yes' and ucontext.h library if the value is set to 'no'

Config: image

To test for native ARMv8, search the library for qt_get. Using the search term getcontext applies when using the ucontext.h library:

image

cjhackillinois avatar Dec 13 '21 20:12 cjhackillinois

Yes - this should apply equally to x86 and Power ISA.

janciesko avatar Dec 13 '21 20:12 janciesko

@olivier-snl @janciesko any particular timing test to use?

cjhackillinois avatar Dec 14 '21 23:12 cjhackillinois

Yes - for context switching, I'd use something like stress/taskspawn.c. Here, it might make sense to add an inner loop over a variable number of yields to control how many yields or context switches per task we do. Multiple context switches per task would hide task creation overheads. For locking, I would probably start with qthreads/test/stress/lock_acq_rel.c (PRTestBranchDec). It might make sense to parametrize a) number of locks and b) the distance between contented locks. In this way we do not benchmark cache coherence but the locking implementation itself.

janciesko avatar Dec 14 '21 23:12 janciesko

Chris, if you have a moment, could you take a look at https://github.com/pmodels/argobots/tree/main/src/arch/fcontext. I'd be interested to know how this asm implementation compares to ours, just ideologically.

janciesko avatar Dec 15 '21 04:12 janciesko

@olivier-snl passed along that repo to me. From what I can tell, they don't save as many general-purpose registers. They also save some floating-point registers. They also use the stack pointer for storing state while our code uses malloc-created (I believe) pointers. So it appears their code is swapping stack frames and preserving certain registers across the swap--appears to not be a full context switch. We are doing a full context switch, but don't preserve any FP registers.

Aarch64 ABI: https://developer.arm.com/documentation/ihi0055/latest

ARMv8 context switching: https://developer.arm.com/documentation/den0024/a/The-Memory-Management-Unit/Context-switching

ucontext.h library implementation: https://code.woboq.org/userspace/glibc/sysdeps/unix/sysv/linux/aarch64/

@janciesko @olivier-snl Looking at the implementation, I notice the ucontext library does not appear to save registers X0 - X17. Not sure why.

another repo using native context switch: https://github.com/kaniini/libucontext/tree/master/arch/aarch64

cjhackillinois avatar Dec 15 '21 20:12 cjhackillinois

After changing the line in the qthread_check_swapcontext.m4 file, I run: image

Using ucontext library:

image

Using native ARMv8 code:

image

@janciesko @olivier-snl It seems the ARMv8 code can be faster, but has a higher deviation. This is on the login node, so I may need to run it on the computer node.

cjhackillinois avatar Dec 16 '21 19:12 cjhackillinois

I merged 'main' into the 'fast-context' branch. I've ran a clean build and finished with a make check. All tests pass except the qutil test. However, even that tests passes sometime. So there may be a minor bug, but not sure. I've ran about 20 times and if a test fails, it was always qutil.

image

image

cjhackillinois avatar Jan 13 '22 18:01 cjhackillinois

I'll try to reproduce.

janciesko avatar Jan 24 '22 20:01 janciesko

I added an image above to show my environment.

cjhackillinois avatar Jan 24 '22 21:01 cjhackillinois