faiss Add sve targets

related: #2884

This PR contains below changes:

Add new optlevel sve
- ARM SVE is extension of ARMv8, so it should be treated similar to AVX2 IMO
Add targets for ARM SVE, faiss_sve and swigfaiss_sve
- These targets will be built when you give -DFAISS_OPT_LEVEL=sve at build time
- Design decision: Don't fix SVE register length.
  - The python package of faiss is "fat binary" (for example, the package for avx2 contains _swigfaiss_avx2.so and _swigfaiss.so)
  - SVE is scalable instruction set (= doesn't fix vector length), but actually we can specify the vector length at compile time.
    - with -msve-vector-length= option
    - When this option is specified, the binary can't work correctly on the CPU which has other vector length rather than specified at compile time
  - When we use fixed vector length, SVE-supported faiss python package will contain 7 shared libraries like _swigfaiss.so , _swigfaiss_sve.so , _swigfaiss_sve128.so , _swigfaiss_sve256.so , _swigfaiss_sve512.so , _swigfaiss_sve1024.so , and _swigfaiss_sve2048.so . The package size will be exploded.
  - For these reason, I don't specify the vector length at compile time and faiss_sve detects the vector length at run time.
Add a mechanism of detecting ARM SVE on runtime environment and importing swigfaiss_sve dynamically
- Currently it only supports Linux, but there is no SVE environment with non-Linux OS now, as far as I know

NOTE: I plan to make one more PR about add some SVE implementation after this PR merged. This PR only contains adding sve target.

May 31 '23 06:05 vorj

Please don't add a faiss/python/swigfaiss_sve.swig file.

May 31 '23 16:05 mdouze

Oh, sorry. I missed but that has been copied at this line. I removed the file and added the path on .gitignore .

May 31 '23 19:05 vorj

environment: line 9: /opt/conda/lib/jvm/languages/python/bin/conda: No such file or directory

🤨

Jun 20 '23 06:06 vorj

Ah, #2917, OK.

Jun 20 '23 06:06 vorj

@mdouze How about the current status of this PR?

Jun 20 '23 07:06 vorj

So the diff only changes the compilation flags, it does not add VSE specific SIMD implementations, right? Do you have hardware to try it on and maybe measure performance improvements?

Jun 21 '23 09:06 mdouze

So the diff only changes the compilation flags, it does not add VSE specific SIMD implementations, right? Do you have hardware to try it on and maybe measure performance improvements?

In this PR faiss uses SVE only with auto vectorized functions like fvec_L2sqr. This PR still has little performance improvements, but I aim this as to add faiss_sve target at first.

Jun 21 '23 18:06 vorj

As I wrote before,

I plan to make one more PR about add some SVE implementation after this PR merged.

It will include SVE implmemtations of code_distance , exhaustive_L2sqr_blas_cmax , and so on.

Jun 22 '23 07:06 vorj

@mdouze IMO the PRs should be separated, but I'm willing to include the commits of performance improvement in this PR if you want it. How would you like it?

Jun 28 '23 09:06 vorj

Sorry for being a bit slow to react. I think that it's fine to land this packaging PR first, let us check the implications in terms of library size.

Jun 29 '23 13:06 mdouze

@mdouze OK. When you will want my action like:

need me to make a decision,
need to change some codes, or
want to know my opinion,

please feel free to send me some comments. Anyway, I will wait the checking for a while. Thanks.

Jun 30 '23 02:06 vorj

@mdouze and @vorj is there any update on adding SVE support and do you guys still have plans to add it? I saw some discussion on the other PR and there was no activity since a while. Basically, we were looking for some optimization to Scalar Quantization(specifically SQfp16) on ARM like AVX2 on x86.

Also, please let us know if you need any help to run tests for SVE support. We have bandwidth and resources to run tests. Thanks!

Sep 21 '23 18:09 naveentatikonda

@naveentatikonda I am just a contributor not employed by Meta, so actually I don't know the plans on this (official faiss) repository. However, as I told above, I have further patches to improve performance more, and I will create PR when this merged.

Sep 22 '23 06:09 vorj

@mdouze and @vorj is there any update on adding SVE support and do you guys still have plans to add it? I saw some discussion on the other PR and there was no activity since a while. Basically, we were looking for some optimization to Scalar Quantization(specifically SQfp16) on ARM like AVX2 on x86.

Also, please let us know if you need any help to run tests for SVE support. We have bandwidth and resources to run tests. Thanks!

@mdouze Did you get a chance to look into my question?

Sep 25 '23 17:09 naveentatikonda

OK so I think a way to move forward is to accept this PR but not cover it with CI. Then optimized code for SVE can be contributed. At some point we will probably either:

add SVE to the CI or
remove SVE support if it turns out it is not used too much.

Is there a doc somewhere that shows what current and future ARM implementaitons support SVE ?

Thanks

Sep 26 '23 13:09 mdouze

Would you mind rebasing on the latest Faiss so that I can import it to the internal Faiss version? Thanks

Sep 26 '23 14:09 mdouze

I can assist and review the code, if needed

Sep 26 '23 19:09 alexanderguzhva

@mdouze

Is there a doc somewhere that shows what current and future ARM implementaitons support SVE ?

At least, current and future CPUs implemented ARMv9 will support SVE, because SVE2 is in the basic instruction set of ARMv9. Cortex-A510, Cortex-X2, Neoverse N2, Neoverse V2 are supporting ARMv9. However, I don't know that concrete implementations (real CPUs) will has ARMv9 or SVE, as this is decided by manufacturers.

Sep 27 '23 03:09 vorj

@naveentatikonda I am just a contributor not employed by Meta, so actually I don't know the plans on this (official faiss) repository. However, as I told above, I have further patches to improve performance more, and I will create PR when this merged.

@vorj Do you also have plans to add sve support to ScalarQuantization after this PR is merged?

Sep 27 '23 16:09 naveentatikonda

@naveentatikonda

Do you also have plans to add sve support to ScalarQuantization after this PR is merged?

Currently I don't have the SVE version of ScalarQuantization, so you will be able to contribute it. However, I will speed it up that the unoptimized codes I will find on some times to spare. If I will find no SVE ScalarQuantization codes at my faiss-optimizing time, I will do that.

Sep 28 '23 17:09 vorj

@mdouze

Would you mind rebasing on the latest Faiss so that I can import it to the internal Faiss version?

I did it. Would you review this?

Oct 06 '23 03:10 vorj

Just want to add a note here that this change is also very important to Nvidia RAPIDS libraries, as we're gearing up to have more libraries optimized for the Grace architecture.

Oct 10 '23 16:10 cjnolet

We are looking into compiling this in the CI @ramilbakhshyiev

May 28 '24 16:05 mdouze

@vorj Actively looking at this and as the next step, I would like to try doing a test CI build. Could you please rebase it again? I will try right away and report back. As a note, we could also add a sve2 target as a follow-up, should be similar to this one.

May 30 '24 18:05 ramilbakhshyiev

@ramilbakhshyiev Actually now I don't have any permission for https://github.com/fixstars/faiss, so please wait a while. I will resolve this in my spare time (like to take the access right again, to ask someone who has write permission to push the rebased branch, or something like that).

May 31 '24 10:05 vorj

@ramilbakhshyiev Rebasing was finished. I added some changes that following the changes in 8 months, catching up numpy 2.0, refactoring, and tiny fix.
I don't know why x86_64 RAFT CI was failed, but...

Jun 20 '24 04:06 vorj

@vorj Thanks! We will be trying this shortly. Meanwhile, I restarted the failed build, there was a transient error and it should be fixed now.

Jun 21 '24 22:06 ramilbakhshyiev

@mengdilin

The failure is not reproducible on the main branch when building on a aarch64 platform using aws's r6g.large instance.

That's not surprising because what you are saying is like that faiss built with -DFAISS_OPT_LEVEL=avx512 doesn't work on AVX512-unsupported CPUs like AMD Zen3 or somthing like that. As I had told that

SVE is an abbreviation of Scalable Vector Extension .

in the issue, Arm SVE is the extension of ARMv8 ISA, and AWS Graviton2 doesn't support it. If you can, please try to use AWS Graviton3 as I had chose c7g.large instance (This is the simplest way). If you cannot (I mean "when you must implement the CI on Graviton2, not Graviton3, for some (technically, economically, for company, and/or other) reasons"), you need to use QEMU or something like that for gtest. Python interface can detect the features of running CPU dynamically, so faiss built with -DFAISS_OPT_LEVEL=sve should work on Graviton2 (it will just load _swigfaiss.so instead of _swigfaiss_sve.so). However gtest binary is built based on the optimization flag, so it doesn't work on Graviton2 directly.

P.S. I'm not a Meta employee (unfortunately), so I can't see the internal URLs if you link it

Jul 10 '24 23:07 vorj

@vorj Thanks! I'll let @mengdilin confirm but I believe this is resolved when it was retried with r8g.large (ARMv9 / Neoverse V2) which does support SVE and I believe SVE2 (something that might be of interest I guess).

Jul 11 '24 01:07 ramilbakhshyiev

Yes, Graviton >= 3 (including r7g.large and r8g.large) can solve above issue, so please take a try. When you will meet other problems, please let me know.

BTW, this PR activates only SVE but not SVE2, so when we want to use SVE2 we need to another PR (and finally it will generate another binary named _swigfaiss_sve2.so , _swigfaiss_armv9.so or something like that).

Jul 11 '24 02:07 vorj