lucene icon indicating copy to clipboard operation
lucene copied to clipboard

Integrating GPU based Vector Search using cuVS

Open chatman opened this issue 11 months ago • 25 comments

Implementation of support for vector search on GPU with cuVS described in https://github.com/apache/lucene/issues/14243

This is an in-progress PR at the moment. Here is a way to test it out:

  • Clone the cuvs repository from the PR branch.
  • ./build.sh libcuvs && ./build.sh java
  • (The above will install the cuvs-java artifacts in local Maven repository)
  • Compile and use this branch in an IDE.

Contributors: @narangvivek10, @punAhuja, @chatman, @ChrisHegarty, along with help from @cjnolet.

chatman avatar Jan 10 '25 15:01 chatman

FYI @uschindler, @ChrisHegarty, @dsmiley, @msokolov

chatman avatar Jan 10 '25 15:01 chatman

@chatman thanks for creating the PR. This looks very interesting. is the idea here is the Lucene library will on a GPU machine and running the CUVS.

navneet1v avatar Jan 10 '25 23:01 navneet1v

This seems pretty far from ready yet. I left some comments on some glaring issues. However, there are other things like:

* tests for queries

* tests for the format

* preventing bad behavior (e.g. using `byte[]`)

I haven't touched on the validity of having an NVIDIA only GPU backed index in Lucene sandbox directly. The new dependencies are huge. IDK if whomever downloads and builds lucene then now have to download and build with these? I am unsure how the sandbox stuff works.

Indeed, Ben. This is WIP at the moment. More tests are WIP. As for loading the entire index in byte[], this is something that we're working with the NVIDIA/cuVS team to see if streaming can be supported (right now it is not).

chatman avatar Jan 17 '25 18:01 chatman

@chatman thanks for creating the PR. This looks very interesting. is the idea here is the Lucene library will on a GPU machine and running the CUVS.

Yes, exactly.

chatman avatar Jan 17 '25 18:01 chatman

I haven't touched on the validity of having an NVIDIA only GPU backed index in Lucene sandbox directly. The new dependencies are huge. IDK if whomever downloads and builds lucene then now have to download and build with these? I am unsure how the sandbox stuff works.

This is something we need to work out as we want this to be. Here are my thoughts at the moment, and things we need consensus on:

  • Right now, the cuvs-java dependency comes from local Maven and it should come from Maven central once the artifacts are available.
  • TODO: If a system doesn't have cuda or GPUs, these codepaths should gracefully fail and indicate that support not available.
  • Continuous testing can be enabled on GPU enabled Jenkins instances (we can have a discussion around that).
  • To validate the integration at API level, some mock tests (that simulate the same functionality using the CPU) can be added to the cuvs-java API.
  • We can discuss whether shipping this by default with release artifacts is a problem or not.

chatman avatar Jan 17 '25 18:01 chatman

Hi, cuvs-java-25.02 is currently compiled with JDK 22, so has a minimum class file version of 66. Lucene compiles with a minimum of JDK 21, class file version 65. The reason why cuvs-java has a minimum of JDK 22 is that is uses Panama/foreign APIs directly, which is fine.

We have two options here:

  1. restructure cuvs-java so that it compiles to a minimum JDK 21, with an mr jar/version specific loading. Maybe it can even strip use class file version 65 and strip the preview bit, similar to what Lucene does. Depends on the nature of what the code is doing - which I've yet to fully look into. All said, this is quite involved, but may make that cuvs-java api more generally useful. ( one can at least develop against it more easily, if not run tests, etc )

  2. Restructure the code in Lucene so that it uses a toolchain and javac from JDK 22 and a java22 sourceset. Also make its loading conditional on the JDK version and/or use an mrjar.

Example of how this is currently failing:

Error occurred during initialization of boot layer
java.lang.module.FindException: Error reading module: /Users/chegar/.m2/repository/com/nvidia/cuvs/cuvs-java/25.02/cuvs-java-25.02.jar
Caused by: java.lang.module.InvalidModuleDescriptorException: Unsupported major.minor version 66.0

FTR - there is no suggestion of intent to increase the required minimum JDK for Lucene - it will remain JDK 21.

ChrisHegarty avatar Jan 27 '25 10:01 ChrisHegarty

I want to start cleaning some of the outstanding items in this PR, but I do not have push access to SearchScale:cuvs-integration-main. Can I get access, or is there a better way to proceed?

ChrisHegarty avatar Jan 27 '25 10:01 ChrisHegarty

I want to start cleaning some of the outstanding items in this PR, but I do not have push access to SearchScale:cuvs-integration-main. Can I get access, or is there a better way to proceed?

I've sent you a maintainer level access invitation. Thanks for your help!

chatman avatar Jan 27 '25 14:01 chatman

restructure cuvs-java so that it compiles to a minimum JDK 21, with an mr jar/version specific loading. Maybe it can even strip use class file version 65 and strip the preview bit, similar to what Lucene does. Depends on the nature of what the code is doing - which I've yet to fully look into. All said, this is quite involved, but may make that cuvs-java api more generally useful. ( one can at least develop against it more easily, if not run tests, etc )

If this is possible, it would be very nice! I think @narangvivek10 has more details, but as I understand, the FFM functionality is being used in Project Panama, but those APIs are available only since JDK22.

chatman avatar Jan 27 '25 14:01 chatman

The third option is to bump the minimum Java requirement to Java 22 on main? I know it's an interim release but maybe we should just do it, anticipating the next major lts (due to be released in September)?

dweiss avatar Jan 27 '25 14:01 dweiss

restructure cuvs-java so that it compiles to a minimum JDK 21, with an mr jar/version specific loading. Maybe it can even strip use class file version 65 and strip the preview bit, similar to what Lucene does. Depends on the nature of what the code is doing - which I've yet to fully look into. All said, this is quite involved, but may make that cuvs-java api more generally useful. ( one can at least develop against it more easily, if not run tests, etc )

If this is possible, it would be very nice! I think @narangvivek10 has more details, but as I understand, the FFM functionality is being used in Project Panama, but those APIs are available only since JDK22.

Tha APIs are available in Java 21, too (with minimal changes regarding some specific parts like string handling). If you omit those, you can compile against java 21 and later strip the preview bit from all classes (Lucebe did this a while ago). Nowadays Lucene extracted a stub-jar with all public class signatures (without code) from java 21 and compiles against it with some compiler tricks.

I don't know if the APIs which are different are used by cuvs, so I can't give a recommendation.

If cuvs is only available with Java 22 due to incompatibility of APIs, we need to either upgrade minimum version of Lucene or we need toolkit magic to only compile this for Java 22 (which I'd like to avoid).

P.S. I'd like to bite into the apple and make Java 22 minimum requirement. Then we can use Memory segments in our public API and clean up a lot. I know people will argue and complain, but we have to go that route soon. I was heavily arguing in the OpenJDK community to make Panama be non-preview in 21, but that's now too late.

The best may be to delay this PR till next Java LTS comes out and jump to next LTS as soon as possible.

uschindler avatar Jan 27 '25 14:01 uschindler

@dweiss While I think that might work for unused/snapshot Lucene releases, I think @chatman et. al. is aiming for usage in the current Lucene Main so that Lucene focused search engines (e.g. OpenSearch, Elasticsearch, etc.) can take advantage of the cuvs format if they want.

@chatman the only way to get this to work in current major releases (e.g. this year) for Apache Lucene engines is to handle this Java version code and use the MR-Jar module stuff. Meaning, it cannot have a hard compilation dependency on java 22.

I'd like to bite into the apple and make Java 22 minimum requirement. Then we can use Memory segments in our public API and clean up a lot.

You mean bump the minimum Java for every Java release until the next LTS, assuming that Lucene 11 will require Java 25? I would be for this myself :). Releasing Lucene 11 with Java 25 will have many nice benefits.

But, I think there is a strong desire for @chatman and company to get this into production without waiting another year+.

benwtrent avatar Jan 27 '25 14:01 benwtrent

P.P.S. Elasticsearch has a Gradle plugin to strip preview flags. Basically it patches one short at a fixed position in all class files that are created by Javac.

uschindler avatar Jan 27 '25 14:01 uschindler

Tha APIs are available in Java 21, too (with minimal changes regarding some specific parts like string handling). If you omit those, you can compile against java 21 and later strip the preview bit from all classes (Lucebe did this a while ago). Nowadays Lucene extracted a stub-jar with all public class signatures (without code) from java 21 and compiles against it with some compiler tricks.

I don't know if the APIs which are different are used by cuvs, so I can't give a recommendation.

From a quick skim, then I think we can manage to make this work. For the small shimmer in the foreign API, we can provide static version specific stubs. We do similar in Elasticsearch.

If cuvs is only available with Java 22 due to incompatibility of APIs, we need to either upgrade minimum version of Lucene or we need toolkit magic to only compile this for Java 22 (which I'd like to avoid).

At a minimum, the cuvs-java API exposes MemorySegment, so all consumers - Lucene in this case - would have to deal with the preview-ness nonsense!. But again, this may be doable with a little fiddling in gradle and the class file.

P.S. I'd like to bite into the apple and make Java 22 minimum requirement. Then we can use Memory segments in our public API and clean up a lot.

As you know, I'm strongly in favour of moving to newer java versions. However, this is quite a move. I will result in every project consuming Lucene to keep upgrading constantly until the next Java LTS. Which I'm personally ok with, but I'm not sure anyone has really done this before.

ChrisHegarty avatar Jan 27 '25 14:01 ChrisHegarty

P.P.S. Elasticsearch has a Gradle plugin to strip preview flags. Basically it patches one byte in all class files that are created by Javac.

Yes, we can do this. Along with an MRJAR plugin. In ES we use the toolchain to resolve newer JDKs, e.g. 22, etc

ChrisHegarty avatar Jan 27 '25 14:01 ChrisHegarty

One additional important thing: no public API in this new Lucene module/codec must export any preview API, so it must all be private/pkg-private.

uschindler avatar Jan 27 '25 14:01 uschindler

FYI - I've made the cuvs-java api Java 21 friendly, with an spi and a java-22 specific impl in the versioned section of an mrjar - MemorySegment and Arena have been removed from the api, so only appear in the 22 impl code. https://github.com/rapidsai/cuvs/pull/628

The pattern is that the api entry points will throw UOE on platforms where there is no support, e.g. < JDK 22, Mac, Windows, or if the native library is not installed.

ChrisHegarty avatar Feb 04 '25 08:02 ChrisHegarty

P.S. I'd like to bite into the apple and make Java 22 minimum requirement.

At least for main branch what is the harm? could we do 23 or 24? I'd really like https://openjdk.org/jeps/467 !

rmuir avatar Feb 07 '25 16:02 rmuir

I think bumping main only for each non LTS release would be cool. Then we keep it at the next LTS (Java 25)?

Or, if its just as long from Lucene 10 -> 11, more likely the next LTS for Lucene 11 would be Java 30 🙈 . Maybe project Valhalla and Panama will be done by then.

My only concerns are:

  • be developer complications (requiring devs to bump java versions all the time, which maybe they should be anyways...)
  • backport complications
  • Friendliness to 3rd party CI stuff (folks who perf test and test against main for some reason, maybe they shouldn't and just stick to 10x).

benwtrent avatar Feb 07 '25 17:02 benwtrent

I think bumping main only for each non LTS release would be cool. Then we keep it at the next LTS (Java 25)?

I filed the following issue to help facilitate the discussion related to bumping the minimum compile version ( since it's no longer relevant in this PR - cvus-java is now at Java 21 ) - https://github.com/apache/lucene/issues/14229

ChrisHegarty avatar Feb 13 '25 09:02 ChrisHegarty

I just committed a rewrite for the cuVS format implementation.

After the rewrite all the BaseKnnVectorsFormatTestCase tests pass. There are still some lurking intermittent failures, but the tests pass successfully the majority of the time.

Summary of the most significant changes:

  1. Use the flat vectors reader/writer to support the raw float32 vectors and ordinal to docId mapping. This is similar to how HNSW is supported in Lucene. And keeps the code aligned with how other formats are layered atop each other.
  2. The cuVS indices (Cagra, brute force, and HNSW) are stored directly in the format, so can be mmap'ed directly.
  3. Merges are physical, all raw vectors are retrieved and used to create new cuVS indices.
  4. A standard KnnCollector is used, no need for a special one for cuVS, unless one wants to customise some very specific parameters.

A number of workarounds have been put in place, which will eventually be lifted.

  1. pre-filter and deleted docs over sample the topK, since the cuvs-java do not yet support a pre-filter.
  2. Ignore Cagra failures indexing with small numbers of docs, fail over to just brute force.
  3. We need to move to the cuvs-java merge api, to avoid bringing the vectors on-heap during merge.

ChrisHegarty avatar Feb 14 '25 11:02 ChrisHegarty

Exciting change! Since this PR adds a new codec for vector search, I wanted to point to #14178 along similar lines -- adding a new Faiss-based KNN format to index and query vectors

Faiss (https://github.com/facebookresearch/faiss) is "a library for efficient similarity search and clustering of dense vectors". It supports various features like vector transforms (eg PCA), indexing algorithms (eg IVF, HNSW, etc), quantization techniques (eg PQ), search strategies (eg 2-step refinement), different hardware (including GPUs -- also has support for cuVS) -- and adding this codec would allow users to make use of (most of) these features!

Internally, the format calls the C API of Faiss using Panama (https://openjdk.org/projects/panama) FFI. The codec is present in the sandbox module, and does not add Faiss as a dependency of Lucene -- only relies on the shared library (along with all dependencies) to be present at runtime (on $LD_LIBRARY_PATH or -Djava.library.path)

Would appreciate feedback on the PR!

kaivalnp avatar Mar 17 '25 18:03 kaivalnp

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution!

github-actions[bot] avatar Apr 05 '25 00:04 github-actions[bot]

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution!

github-actions[bot] avatar Apr 24 '25 00:04 github-actions[bot]

How is this coming along I'm looking forward to this shipping.

zacksiri avatar Jun 14 '25 05:06 zacksiri

FYI, there were some important building blocks missing in cuVS that were needed for a good first class integration. As a result, this PR represented several inefficiencies and workarounds.

At the moment, we are working on the cuVS side of things that are needed to make this production ready. Once they are done and released, we can open a new PR. For now, this PR represents a reasonable proof of concept.

Important pieces that were not available for use during this PR:

  • Prefiltering in CAGRA (done)
  • CAGRA Merge API (done)
  • Efficient dataset loading (avoiding float[][]) while creating the CAGRA index (done, also some improvements WIP)

In future, in cuVS, we plan to work on:

  • Quantization support
  • Multi-GPU support
  • HNSW index build support using GPU

I'm closing this PR here, and shall open a new PR soon, containing some of these changes incorporated.

FYI @ChrisHegarty @benwtrent.

chatman avatar Jun 27 '25 15:06 chatman