concurrencykit broken on `aarch64`
After enabling the python tests for PR #315 , the test_ccomp_and_epsilon test in test_mk_hal_basics.py fails only on arm64 builders: Drone & Travis CI Ubuntu focal and bionic, and Debian Buster (but not Stretch??!?).
I tracked this down to hal_ccomp_match(), in hal/lib/hal_rcomp.c, where _get_float_pin() is called and returns a completely bogus value for the pin (should have been 101.0, actually was 4636878028842991616.0). Retrieving the value directly in the same section of code (with something like ((hal_data_u*)hal_ptr(hp->data_ptr))->f) returned the correct value.
I noticed that the PINGETTER(_GETVALUEDOUBLE, float, HAL_FLOAT, f, FLOATCAST); (where _get_float_pin() comes from) in hal/lib/hal_accessor.h actually expands to something close to the above, except it adds the concurrencykit parts:
static inline const hal_float_t _get_float_pin2(const hal_pin_t *pin) {
const hal_data_u *u = (const hal_data_u *)hal_ptr(pin->data_ptr);
if (__builtin_expect(!!(hh_get_rmb(&pin->hdr)), 0))
ck_pr_fence_load();
return ck_pr_md_load_double((((double *) &u->f)));;
}
I don't know enough to determine the problem, but disabling concurrencykit by turning off the HAVE_CK macro in config.h, the tests pass.
I don't see any open issues surrounding aarch64 in the concurrencykit issue tracker except for concurrencykit/ck#156, which doesn't appear related.
Would it be possible to get the disassembly of this function with and without Concurrency Kit? What is the conditional fence for? I'll also add some additional coverage for ck_pr_*_double, perhaps the widths are incorrect on aarch64.