cuvs icon indicating copy to clipboard operation
cuvs copied to clipboard

SNMG ANN

Open viclafargue opened this issue 1 year ago • 1 comments

This PR implements a distributed (single-node-multiple-GPUs) implementation of ANN indexes. It allows to build, extend and search an index on multiple GPUs.

Before building the index, the user has to choose between two modes :

Sharding mode : The index dataset is split, each GPU trains its own index with its respective share of the dataset. This is intended to both increase the search throughput and the maximal size of the index. Index duplication mode : The index is built once on a GPU and then copied over to others. Alternatively, the index dataset is sent to each GPU to be built there. This intended to increase the search throughput.

SNMG indexes can be serialized and de-serialized. Local models can also be deserialized and deployed in index duplication mode.

Migrated from https://github.com/rapidsai/raft/pull/1993

viclafargue avatar Jul 18 '24 10:07 viclafargue

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

copy-pr-bot[bot] avatar Jul 18 '24 10:07 copy-pr-bot[bot]

By default, the replicated mode splits the query and gives an equal share to each GPU. However as discussed, to serve "online" use-cases we could instead detect single row queries and make the index run as a load balancer in this case. I think that most things should run fine for this as long as the user calls the search function in different threads. However, the search as implemented right now will systematically give the query to the first GPU in the clique. We could make this a random attribution instead. However, what would happen if an local index is used twice in parallel? We could maybe have a thread safe (mutex protected) structure in the MG index to ensure secure access of local indexes. Do you guys have insights on this?

cc @cjnolet @tfeher @achirkin

viclafargue avatar Aug 21 '24 16:08 viclafargue

We could make this a random attribution instead. However, what would happen if an local index is used twice in parallel? We could maybe have a thread safe (mutex protected) structure in the MG index to ensure secure access of local indexes. Do you guys have insights on this?

Honestly, I think rather than making a special case for single-query batches, it'd be fine to just use an atomic counter and always send the next query to the next device. You could do this, for example, by incrementing a number to a particular max (e.g. number of gpus) and resetting it back to 0 each time it hits that max. This should be well documented, of course, but it would give users the ability to batch queries themselves and the behavior would be consistent across all batch sizes.

cjnolet avatar Aug 29 '24 16:08 cjnolet

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

/ok to test

tfeher avatar Oct 02 '24 16:10 tfeher

/ok to test

jameslamb avatar Oct 02 '24 20:10 jameslamb

/merge

cjnolet avatar Oct 02 '24 20:10 cjnolet

/ok to test

jameslamb avatar Oct 02 '24 21:10 jameslamb

/ok to test

jameslamb avatar Oct 02 '24 21:10 jameslamb

/ok to test

cjnolet avatar Oct 02 '24 21:10 cjnolet

/ok to test

jameslamb avatar Oct 02 '24 22:10 jameslamb

/ok to test

cjnolet avatar Oct 02 '24 22:10 cjnolet

/ok to test

jameslamb avatar Oct 02 '24 23:10 jameslamb

/ok to test

cjnolet avatar Oct 03 '24 00:10 cjnolet

/ok to test

cjnolet avatar Oct 03 '24 00:10 cjnolet

/ok to test

cjnolet avatar Oct 03 '24 00:10 cjnolet

/ok to test

cjnolet avatar Oct 03 '24 00:10 cjnolet

/ok to test

cjnolet avatar Oct 03 '24 00:10 cjnolet

/ok to test

cjnolet avatar Oct 03 '24 00:10 cjnolet

/ok to test

cjnolet avatar Oct 03 '24 00:10 cjnolet

/ok to test

cjnolet avatar Oct 03 '24 01:10 cjnolet

/ok to test

cjnolet avatar Oct 03 '24 01:10 cjnolet

CI error here seemed to have been a connection error:

[rapids-conda-retry] Exiting, no retryable mamba errors detected: 'ChecksumMismatchError:', 'ChunkedEncodingError:', 'CondaHTTPError:', 'CondaMultiError:', 'Connection broken:', 'ConnectionError:', 'DependencyNeedsBuildingError:', 'EOFError:', 'JSONDecodeError:', 'Multi-download failed', 'Timeout was reached', segfault exit code 139

dantegd avatar Oct 03 '24 02:10 dantegd

/ok to test

divyegala avatar Oct 03 '24 04:10 divyegala

/ok to test

tfeher avatar Oct 03 '24 10:10 tfeher

/ok to test

cjnolet avatar Oct 03 '24 13:10 cjnolet

/ok to test

cjnolet avatar Oct 03 '24 13:10 cjnolet

/merge

cjnolet avatar Oct 03 '24 15:10 cjnolet