Raghuveer Devulapalli comments

Results 122 comments of


                                            Raghuveer Devulapalli

ENH: Vectorize quicksort for 16-bit and 64-bit dtype using AVX512

Added AVX-512 quicksort for float16. This was the tricky one and I have a feeling it can be improved further. Benchmarks: ``` before after ratio [db7414b7] [cf13cd63] - 196±0.4μs 140±0.1μs...

ENH: Vectorize quicksort for 16-bit and 64-bit dtype using AVX512

@mattip The ARM64 failure on TravisCI is unlikely to be related to this patch (this patch is x86 code only). Looks like other PR's have the similar failures too: https://app.travis-ci.com/github/numpy/numpy/jobs/587139824...

BLD: Add compile and runtime checks for AVX512FP16 and AVX512_SPR

"This check was cancelled". Hmm, cant think of a reason why.

ENH: Vectorize FP16 umath functions using AVX512

Might be useful to add a new CI test to run this new content on Intel SDE.

ENH: Vectorize FP16 umath functions using AVX512

Reworked the patch to work on AVX512. Perf numbers for FP16 functions look great with a **33x - 65x** speed up (on SkylakeX) depending on the function: ``` before after...

ENH: Vectorize FP16 umath functions using AVX512

PR https://github.com/numpy/numpy/pull/21954 adds comprehensive test coverage for these math functions. I will rebase this PR once that is merged.

ENH: Vectorize FP16 umath functions using AVX512

ping ..

sin and cos of std::simd come out wrong with clang++

It seems to be coming from here: https://github.com/VcDevel/std-simd/blob/c69cb8f6a8627c186427b08662b05693176c73b2/experimental/bits/simd_builtin.h#L51. The `_andnot `returns a 0 for `_S_absmask `with clang, the correct value should be 0x7FFFFFFF.

BLD: Add compile and runtime checks for AVX512FP16 and AVX512_SPR

ping :)

BLD: Add compile and runtime checks for AVX512FP16 and AVX512_SPR

The failed win32 test `def test_no_dgemv` failure seems unrelated :/