cuvs [Feat] Add support of logical merge in Cagra

https://github.com/rapidsai/cuvs/issues/712

Feb 20 '25 20:02 rhdong

I may have missed some prior discussion, but the cagra-specific logical merge seems a little bit artificial for me. If I understand this correctly, you essentially implement a multi-index search. Then, why don't we go one step further and make it independent of a particular index implementation? Moreover, it looks like much of this functionality is implemented already in multi-gpu index. Maybe we can generalize that one a bit to decouple "multi-index" from "multi-gpu" aspect? Optionally, one can also combine different index types and erase the upstream index type like we do in dynamic batching index.

Feb 21 '25 09:02 achirkin

I may have missed some prior discussion, but the cagra-specific logical merge seems a little bit artificial for me.

@achirkin you and I haven't discussed this yet but this feature is critical for certain database architectures like Lucene, which are variants of the log-structured-merge pattern but build indexes immediately after segments (flat files containing vectors) are created, rather than merging the segments together before building the indexes. This causes Lucene to have to first merge many tiny CAGRA indexes together, but eventually it'll end up merging very large indexes together. It is this latter case thart we care about doing a logical merge. It's more efficient to create a single cagra index with many (potentially hundreds) of tiny cagra indexes but when they reach a certain size, it's actually more efficient to logically merge them; meaning we essentially wrap them as if they were shards and broadcast the query to all the shards during search.

I agree this is similar in theory to the multi-gpu sharding and perhaps there is some code to be reused there. The next step for the merge() API is to be able to implement a SMART merge strategy, whereby we can more efficiently merge two cagra graphs together without having to rebuild from scratch, and that's ultimately the strategy we have discussed offline in more detail. The problem is that Lucene can make use of this today, so it gives us time to work towards the SMART option.

Then, why don't we go one step further and make it independent of a particular index implementation?

I do agree with this- it doesn't have to be (and ideally wouldn't be) specifically for CAGRA, though that just happens to be the index we care about today because it unblocks Lucene.

Feb 21 '25 16:02 cjnolet

I totally agree and don't doubt the usefulness of logical merge. I've just pointed out that I think we can accomplish a more general solution (supporting any index type) with actually less code by re-using/adjusting what we already have in multi-index/dynamic batching.

Feb 26 '25 16:02 achirkin

I may have missed some prior discussion, but the cagra-specific logical merge seems a little bit artificial for me. If I understand this correctly, you essentially implement a multi-index search. Then, why don't we go one step further and make it independent of a particular index implementation? Moreover, it looks like much of this functionality is implemented already in multi-gpu index. Maybe we can generalize that one a bit to decouple "multi-index" from "multi-gpu" aspect? Optionally, one can also combine different index types and erase the upstream index type like we do in dynamic batching index.

Hi @achirkin , sorry for the late response to your comments here. I studied the code of SNMG, which is a super useful feature. When I tried to reuse some code in it, I found the caller has to work with locker, nccl, which is unnecessary in the logical merge of CAGRA. I'd like to keep the simple implementation if you agree. Many thanks! cc: @cjnolet

Mar 13 '25 00:03 rhdong

Hi @cjnolet @achirkin, could you please merge it if there are no more comments? Many thanks!

Mar 18 '25 23:03 rhdong

When I tried to reuse some code in it, I found the caller has to work with locker, nccl, which is unnecessary in the logical merge of CAGRA. I'd like to keep the simple implementation if you agree. Many thanks! cc: @cjnolet

@rhdong I don't think the ask here is that you use the multi-gpu apis, but rather that we have a more consistent and general means of being able to support wrapper types that can work with any index type and avoid the need to, for example, implement a separate composite_index type for each index type within cuVS. Dynamic batching and the multi-gpu apis are two examples of where we have a single index that can be used by any other index. This is ultimately where we should strive to be.

Apr 25 '25 14:04 cjnolet

When I tried to reuse some code in it, I found the caller has to work with locker, nccl, which is unnecessary in the logical merge of CAGRA. I'd like to keep the simple implementation if you agree. Many thanks! cc: @cjnolet

@rhdong I don't think the ask here is that you use the multi-gpu apis, but rather that we have a more consistent and general means of being able to support wrapper types that can work with any index type and avoid the need to, for example, implement a separate composite_index type for each index type within cuVS. Dynamic batching and the multi-gpu apis are two examples of where we have a single index that can be used by any other index. This is ultimately where we should strive to be.

Hi @cjnolet @achirkin , just want to clarify before moving forward, what you're suggesting is lifting up the composite_index to the upper level, like by moving it to common.hpp, and using the cuvs::neighbour::index as array item type, am I right? Many thanks!

Apr 30 '25 18:04 rhdong

Hi @cjnolet @achirkin , just want to clarify before moving forward, what you're suggesting is lifting up the composite_index to the upper level, like by moving it to common.hpp, and using the cuvs::neighbour::index as array item type, am I right? Many thanks!

Yes, though I think we could address this in a follow-up PR and merge this change in the meantime. We really need CAGRA merge capability initially so that we can unblock our Lucene friends. @rhdong can you create a new github issue to follow-up with the first-class formal generalized composite_index?

The other big reason why centralizing this composite index is important is because we want it to work out of the box with the other APIs that work with general indexes (such as the snmg apis).

May 07 '25 14:05 cjnolet

Hi @cjnolet @achirkin , just want to clarify before moving forward, what you're suggesting is lifting up the composite_index to the upper level, like by moving it to common.hpp, and using the cuvs::neighbour::index as array item type, am I right? Many thanks!

Yes, though I think we could address this in a follow-up PR and merge this change in the meantime. We really need CAGRA merge capability initially so that we can unblock our Lucene friends. @rhdong can you create a new github issue to follow-up with the first-class formal generalized composite_index?

The other big reason why centralizing this composite index is important is because we want it to work out of the box with the other APIs that work with general indexes (such as the snmg apis).

@cjnolet @achirkin I created an issue for tracking our discussion with this in one place: https://github.com/rapidsai/cuvs/issues/882

May 07 '25 23:05 rhdong

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

May 12 '25 00:05 copy-pr-bot[bot]

/ok to test 37bbfe52c18373e7d8e5e2159f537fd8ddd80fb1

May 12 '25 00:05 rhdong

Depends on the raft PR: https://github.com/rapidsai/raft/pull/2674

May 19 '25 23:05 rhdong

Hi @cjnolet , I’ve provided a multi-stream implementation, but it appears to show no performance improvement. (Sorry for the accidental force push during this). According to your suggestion, I’ve submitted an issue(https://github.com/rapidsai/cuvs/issues/946) and will continue working on further optimizations.

May 27 '25 20:05 rhdong

/merge

May 28 '25 21:05 cjnolet