Christopher Sidebottom
Christopher Sidebottom
Previously baseline AArch64 was left without `SWITCH_RATIO` or `GEMM_PREFERED_SIZE` and older default values, but it can be seen across other cores that these values seem to work for many devices....
- `ComputeUnary` provides a wrapper to hide the `Load`/`Store` logic needed for vectorised routines - `MaskedScalarFallbackUnary` contains all the logic to call a scalar fallback routine against masked vector lanes
The cast in `numpy/distutils/checks/cpu_neon_fp16.c` didn't compile correctly in MSVC, which lead to `NEON` reporting as unsupported due to it encompassing `NEON_FP16` in `meson_cpu/arm/meson.build`. I'm not an expert, but it appears...
These are experiments to see whether or not we can improve performance a bit on 128-bit SVE cores by using ASIMD instead.