Raghuveer Devulapalli

Results 122 comments of Raghuveer Devulapalli

@seiko2plus done. Let me know if that looks okay.

>> This indicates the gather/scatter aren't as optimal as the NumPy ones; I wonder if we can blend the NumPy loads and stores with the Highway code here 🤔 That...

@seiko2plus I am seeing slowdown for strided cases as well. I just meant this could be a result of my `GatherIndexN` and `ScatterIndexN `functions which just perform a simple scalar...

Moving the `hn::StoreMaskBits` to inside the if condition helped perf by a little bit, now we are just about 1.2x slower. ``` | + | 7.47±0.01μs | 9.11±0.05μs | 1.22...

> The latter at least I can help with. We are missing HWY_ATTR: Adding HWY_ATTR fixed the build errors on ppc64le. Why did it fail only for this platform though?...

> hm, strange. Neither the x86 implementation of TableLookupBytesOr0, nor the quoted line and the one before it, have a numeric constant. Which compiler is cygwin using? From the logs:...

pulling in latest highway to fix build failure on cygwin.

> @r-devulap you should be able to change this line to `abort.cc`: Done.

~The sin/cos file is now free of `#include "simd/simd.h` and uses highway exclusively~. I have also updated to use highway gather and scatter. EDIT: Hah, not quite yet. `npyv_cleanup` and...