Raghuveer Devulapalli
Raghuveer Devulapalli
Added AVX-512 quicksort for float16. This was the tricky one and I have a feeling it can be improved further. Benchmarks: ``` before after ratio [db7414b7] [cf13cd63] - 196±0.4μs 140±0.1μs...
@mattip The ARM64 failure on TravisCI is unlikely to be related to this patch (this patch is x86 code only). Looks like other PR's have the similar failures too: https://app.travis-ci.com/github/numpy/numpy/jobs/587139824...
"This check was cancelled". Hmm, cant think of a reason why.
Might be useful to add a new CI test to run this new content on Intel SDE.
Reworked the patch to work on AVX512. Perf numbers for FP16 functions look great with a **33x - 65x** speed up (on SkylakeX) depending on the function: ``` before after...
PR https://github.com/numpy/numpy/pull/21954 adds comprehensive test coverage for these math functions. I will rebase this PR once that is merged.
It seems to be coming from here: https://github.com/VcDevel/std-simd/blob/c69cb8f6a8627c186427b08662b05693176c73b2/experimental/bits/simd_builtin.h#L51. The `_andnot `returns a 0 for `_S_absmask `with clang, the correct value should be 0x7FFFFFFF.
The failed win32 test `def test_no_dgemv` failure seems unrelated :/