qthreads icon indicating copy to clipboard operation
qthreads copied to clipboard

Enable QTHREAD_ARMV8_A64 context swap on Apple Mx (ARM) hardware

Open janciesko opened this issue 2 years ago • 5 comments

This makes Qthreads compile with fast context switching implemented in asm.

janciesko avatar Jul 19 '22 17:07 janciesko

What is the testing status of this? Should test on arm mac, x86 mac, arm linux and x86 linux to ensure works on the new platform and no harm done on the others.

olivier-snl avatar Jul 19 '22 17:07 olivier-snl

Make check passes on M1 Mac, x86 Linux, ARM64 Linux.

janciesko avatar Jul 25 '22 17:07 janciesko

I'm seeing failures for a couple types of tests.

One failure for tests that sleep, where it seems like when doing sleeps from different tasks only some of the tasks actually sleep. Our sleep code is implemented here. If I switch to always using the gettimeofday code instead of the qtimer code then sleeps seem to work correctly. I thought maybe the mach timer initialization in qthreads was racy, but doing a serial timer creation during serial init didn't seem to help

In another mode, multiples tasks adding to a data structure protected by a lock seems to be racy in some way.

I don't have any standalone qthread reproducers yet, but some of the chapel tests are:

$CHPL_HOME/test/domains/sungeun/assoc/parSafeMember.chpl
$CHPL_HOME/test/parallel/begin/deitz/test_begin5.chpl

These tests pass with fastcontext disabled, and I'm not sure how to go about debugging offhand. Probably the starting point is to get standalone qthread reproducers.

ronawho avatar Aug 15 '22 16:08 ronawho

I'm seeing failures for a couple types of tests.

One failure for tests that sleep, where it seems like when doing sleeps from different tasks only some of the tasks actually sleep. Our sleep code is implemented here. If I switch to always using the gettimeofday code instead of the qtimer code then sleeps seem to work correctly. I thought maybe the mach timer initialization in qthreads was racy, but doing a serial timer creation during serial init didn't seem to help

In another mode, multiples tasks adding to a data structure protected by a lock seems to be racy in some way.

I don't have any standalone qthread reproducers yet, but some of the chapel tests are:

$CHPL_HOME/test/domains/sungeun/assoc/parSafeMember.chpl
$CHPL_HOME/test/parallel/begin/deitz/test_begin5.chpl

These tests pass with fastcontext disabled, and I'm not sure how to go about debugging offhand. Probably the starting point is to get standalone qthread reproducers.

@ronawho Do your tests fail only on arm mac or on other platforms as well? I know you said that they work with the system ucontext. Do they work with fastcontext on arm linux and x86 mac? Just trying to narrow down the the problem and figure out if we introduced problems for the other platforms as well since the last round of chapel testing.

olivier-snl avatar Aug 15 '22 16:08 olivier-snl

These are old tests, so I believe they are only failing for fastcontext on arm based macs. We have lots of x86 testing on mac and linux. We have some arm linux testing, but not as extensive so I can't say confidently at the moment if these tests pass consistently on other arm platforms.

ronawho avatar Aug 15 '22 19:08 ronawho

Tests seem to fail on arm (thunderx2) linux with fastcontext. They pass with the system ucontext. So seems to be arm fastcontext specific. I don't know that I would have run these specific tests when checking out https://github.com/Qthreads/qthreads/pull/95. They are part of the full test suite and given that I was mostly testing that PR out on cray's I was probably running a more limited subset of the tests.

ronawho avatar Aug 18 '22 02:08 ronawho

Tests seem to fail on arm (thunderx2) linux with fastcontext. They pass with the system ucontext. So seems to be arm fastcontext specific. I don't know that I would have run these specific tests when checking out #95. They are part of the full test suite and given that I was mostly testing that PR out on cray's I was probably running a more limited subset of the tests.

@ronawho Thanks for running the additional testing. Since the differences between the mac and Linux fastcontext assembly are very minor, it is not surprising that they fail the same tests, and it is helpful to have that confirmation so that we can reproduce and debug on either platform.

olivier-snl avatar Aug 18 '22 13:08 olivier-snl

Right, and since this is an existing issue and not something new or mac specific, I don't think it needs to hold up this PR, but I think Chapel will want to wait until this is resolved to use native fastcontext switching for ARM.

I'm trying to make a standalone qthreads reproducer from test_begin5.chpl, but wasn't able to in a few minutes. I'm pretty busy the next few days, so not sure when I'll get a chance to try again. This test is kinda silly and could be argued that it's invalid since it uses sleep to try and get output in a specific order. However, I think it still exposes an underlying issue and it's a lot easier to create a reproducer that looks like this instead of the other test failure so I'd probably start with that test and hope it's the same underlying issue.

I also tried looking at the argobots arm assembly to see if I could see any obvious bugs in the qthreads version, but they are different enough and my understanding of asm context switching is poor enough that I didn't really come to any conclusion.

ronawho avatar Aug 18 '22 15:08 ronawho

@cjhackillinois - would you be able to look at this? We do not catch this error with our testing suite. It might be that some register is not handled correctly and that in this case the register happens to store the state of a lock.

janciesko avatar Aug 24 '22 20:08 janciesko

@ronawho We will merge this into main. We will keep the issue open that you've reported and will fix that in a minor release (1.18.1).

janciesko avatar Sep 08 '22 19:09 janciesko