carll99
carll99
The pull request if for a patch to faiss/utils/simdlib_emulated.h. The patch improves the performance of the bench_ivf_fastscan.py workload by unrolling the for loop to eliminate the if (j < 16)...
OK, commenting out the original code and then adding the change is easy to do. The patch was tried on AIX and showed similar performance improvement. I have not tested...
Thanks for the discussion and the example showing what you are thinking that is very helpful. In the example as you say, the entire file is copied into the arch...
OK, I will make a complete simdlib_ppc64.h file, not a problem. I was looking to see how __AVX2__ and __aarch64__ get defined... I see the various uses but I am...
I have updated the patch. I Put the #if define in simdlib_emulated.h to either include the new simdlib_emulated_ppc64.h file or use the original simdlib_emulated.h file. The new simdlib_emulated_ppc64.h file has...
Not sure how I managed to close the pull request when I was updating my copy of faiss. Re-opened the pull request. I updated my fork of faiss to the...
Ah, I see, I didn't put the if define at the right level of the include files. Updated the patch to put the #if define into faiss/utils/simdlib.h not faiss/utils/simdlib_emulated.h. Hopefully...
OK, I see where you are going with the #elif defined(__PPC64__). That is an easy fix. I should have figured out what you wanted earlier. Sorry. As for the comment...
The pull request seems to get closed every time I resync with main and throw out my old patch before I push the updated patch. Reopening.
I updated the patch again per the comments about how I should be doing the #if define. I also ran the clang format commands: clang-format -i faiss/utils/simdlib.h clang-format -i faiss/utils/simdlib_ppc64.h...