Add sve targets
related: #2884
This PR contains below changes:
- Add new optlevel
sve- ARM SVE is extension of ARMv8, so it should be treated similar to AVX2 IMO
- Add targets for ARM SVE,
faiss_sveandswigfaiss_sve- These targets will be built when you give
-DFAISS_OPT_LEVEL=sveat build time - Design decision: Don't fix SVE register length.
- The python package of faiss is "fat binary" (for example, the package for avx2 contains
_swigfaiss_avx2.soand_swigfaiss.so) - SVE is scalable instruction set (= doesn't fix vector length), but actually we can specify the vector length at compile time.
-
with
-msve-vector-length=option - When this option is specified, the binary can't work correctly on the CPU which has other vector length rather than specified at compile time
-
with
- When we use fixed vector length, SVE-supported faiss python package will contain 7 shared libraries like
_swigfaiss.so,_swigfaiss_sve.so,_swigfaiss_sve128.so,_swigfaiss_sve256.so,_swigfaiss_sve512.so,_swigfaiss_sve1024.so, and_swigfaiss_sve2048.so. The package size will be exploded. - For these reason, I don't specify the vector length at compile time and
faiss_svedetects the vector length at run time.
- The python package of faiss is "fat binary" (for example, the package for avx2 contains
- These targets will be built when you give
- Add a mechanism of detecting ARM SVE on runtime environment and importing
swigfaiss_svedynamically- Currently it only supports Linux, but there is no SVE environment with non-Linux OS now, as far as I know
NOTE: I plan to make one more PR about add some SVE implementation after this PR merged. This PR only contains adding sve target.
Please don't add a faiss/python/swigfaiss_sve.swig file.
Oh, sorry. I missed but that has been copied at this line. I removed the file and added the path on .gitignore .
environment: line 9: /opt/conda/lib/jvm/languages/python/bin/conda: No such file or directory
🤨
Ah, #2917, OK.
@mdouze How about the current status of this PR?
So the diff only changes the compilation flags, it does not add VSE specific SIMD implementations, right? Do you have hardware to try it on and maybe measure performance improvements?
So the diff only changes the compilation flags, it does not add VSE specific SIMD implementations, right? Do you have hardware to try it on and maybe measure performance improvements?
In this PR faiss uses SVE only with auto vectorized functions like fvec_L2sqr.
This PR still has little performance improvements, but I aim this as to add faiss_sve target at first.
As I wrote before,
I plan to make one more PR about add some SVE implementation after this PR merged.
It will include SVE implmemtations of code_distance , exhaustive_L2sqr_blas_cmax , and so on.
@mdouze IMO the PRs should be separated, but I'm willing to include the commits of performance improvement in this PR if you want it. How would you like it?
Sorry for being a bit slow to react. I think that it's fine to land this packaging PR first, let us check the implications in terms of library size.
@mdouze OK. When you will want my action like:
- need me to make a decision,
- need to change some codes, or
- want to know my opinion,
please feel free to send me some comments. Anyway, I will wait the checking for a while. Thanks.
@mdouze and @vorj is there any update on adding SVE support and do you guys still have plans to add it? I saw some discussion on the other PR and there was no activity since a while. Basically, we were looking for some optimization to Scalar Quantization(specifically SQfp16) on ARM like AVX2 on x86.
Also, please let us know if you need any help to run tests for SVE support. We have bandwidth and resources to run tests. Thanks!
@naveentatikonda I am just a contributor not employed by Meta, so actually I don't know the plans on this (official faiss) repository. However, as I told above, I have further patches to improve performance more, and I will create PR when this merged.
@mdouze and @vorj is there any update on adding SVE support and do you guys still have plans to add it? I saw some discussion on the other PR and there was no activity since a while. Basically, we were looking for some optimization to Scalar Quantization(specifically SQfp16) on ARM like AVX2 on x86.
Also, please let us know if you need any help to run tests for SVE support. We have bandwidth and resources to run tests. Thanks!
@mdouze Did you get a chance to look into my question?
OK so I think a way to move forward is to accept this PR but not cover it with CI. Then optimized code for SVE can be contributed. At some point we will probably either:
- add SVE to the CI or
- remove SVE support if it turns out it is not used too much.
Is there a doc somewhere that shows what current and future ARM implementaitons support SVE ?
Thanks
Would you mind rebasing on the latest Faiss so that I can import it to the internal Faiss version? Thanks
I can assist and review the code, if needed
@mdouze
Is there a doc somewhere that shows what current and future ARM implementaitons support SVE ?
At least, current and future CPUs implemented ARMv9 will support SVE, because SVE2 is in the basic instruction set of ARMv9. Cortex-A510, Cortex-X2, Neoverse N2, Neoverse V2 are supporting ARMv9. However, I don't know that concrete implementations (real CPUs) will has ARMv9 or SVE, as this is decided by manufacturers.
@naveentatikonda I am just a contributor not employed by Meta, so actually I don't know the plans on this (official faiss) repository. However, as I told above, I have further patches to improve performance more, and I will create PR when this merged.
@vorj Do you also have plans to add sve support to ScalarQuantization after this PR is merged?
@naveentatikonda
Do you also have plans to add sve support to ScalarQuantization after this PR is merged?
Currently I don't have the SVE version of ScalarQuantization, so you will be able to contribute it. However, I will speed it up that the unoptimized codes I will find on some times to spare. If I will find no SVE ScalarQuantization codes at my faiss-optimizing time, I will do that.
@mdouze
Would you mind rebasing on the latest Faiss so that I can import it to the internal Faiss version?
I did it. Would you review this?
Just want to add a note here that this change is also very important to Nvidia RAPIDS libraries, as we're gearing up to have more libraries optimized for the Grace architecture.
We are looking into compiling this in the CI @ramilbakhshyiev
@vorj Actively looking at this and as the next step, I would like to try doing a test CI build. Could you please rebase it again? I will try right away and report back. As a note, we could also add a sve2 target as a follow-up, should be similar to this one.
@ramilbakhshyiev Actually now I don't have any permission for https://github.com/fixstars/faiss, so please wait a while. I will resolve this in my spare time (like to take the access right again, to ask someone who has write permission to push the rebased branch, or something like that).
@ramilbakhshyiev Rebasing was finished. I added some changes that following the changes in 8 months, catching up numpy 2.0, refactoring, and tiny fix.
I don't know why x86_64 RAFT CI was failed, but...
@vorj Thanks! We will be trying this shortly. Meanwhile, I restarted the failed build, there was a transient error and it should be fixed now.
@mengdilin
The failure is not reproducible on the main branch when building on a aarch64 platform using aws's r6g.large instance.
That's not surprising because what you are saying is like that faiss built with -DFAISS_OPT_LEVEL=avx512 doesn't work on AVX512-unsupported CPUs like AMD Zen3 or somthing like that.
As I had told that
in the issue, Arm SVE is the extension of ARMv8 ISA, and AWS Graviton2 doesn't support it.
If you can, please try to use AWS Graviton3 as I had chose c7g.large instance (This is the simplest way).
If you cannot (I mean "when you must implement the CI on Graviton2, not Graviton3, for some (technically, economically, for company, and/or other) reasons"), you need to use QEMU or something like that for gtest.
Python interface can detect the features of running CPU dynamically, so faiss built with -DFAISS_OPT_LEVEL=sve should work on Graviton2 (it will just load _swigfaiss.so instead of _swigfaiss_sve.so).
However gtest binary is built based on the optimization flag, so it doesn't work on Graviton2 directly.
P.S. I'm not a Meta employee (unfortunately), so I can't see the internal URLs if you link it
@vorj Thanks! I'll let @mengdilin confirm but I believe this is resolved when it was retried with r8g.large (ARMv9 / Neoverse V2) which does support SVE and I believe SVE2 (something that might be of interest I guess).
Yes, Graviton >= 3 (including r7g.large and r8g.large) can solve above issue, so please take a try. When you will meet other problems, please let me know.
BTW, this PR activates only SVE but not SVE2, so when we want to use SVE2 we need to another PR (and finally it will generate another binary named _swigfaiss_sve2.so , _swigfaiss_armv9.so or something like that).