cuvs icon indicating copy to clipboard operation
cuvs copied to clipboard

FP16 API for CAGRA and IVF-PQ

Open tfeher opened this issue 1 year ago • 4 comments

This PR adds public API to CAGRA and IVF-PQ ANN search using FP16 input data.

Note that the fp16 kernels are already compiled in libcuvs. This PR just adds the missing public API declarations into the C++ headers, and restores the instantiations of public API functions.

This PR partially fixes https://github.com/rapidsai/cuvs/issues/144 (the IVF-Flat API is not yet added here).

tfeher avatar Jul 30 '24 22:07 tfeher

We also need to measure the impact of binary size and compile time for adding these new types to the public API.

We can't keep increasing without figuring out ways we can consolidate what's there. These two thjngs are the number 1 complaint from users currently. (It's not just this PR. This is also holding up the half precision bfknn and RBC PRs).

cjnolet avatar Jul 30 '24 23:07 cjnolet

Linking https://github.com/rapidsai/cuvs/issues/110

cjnolet avatar Jul 30 '24 23:07 cjnolet

The kernels are already compiled and included in libcuvs.so. The additional instantiations of the API entry points shall have negligible size. I will confirm this once CI finishes.

It was a mistake during porting the code from RAFT, that the public API was not defined for fp16, therefore I labelled this PR as a bugfix.

tfeher avatar Jul 30 '24 23:07 tfeher

Sorry @tfeher, I understand that there were some bits which were not exposed in the prior port, but this still doesn't change the increase in binary size. We need to address this before we continue to merge changes that increase it.

I propose we look at things that can be consolidated and maybe try using JIT for some of these things. We have to take a step back and fix this at the source.

cjnolet avatar Jul 31 '24 21:07 cjnolet

Binary size: 679 -> 661 MB The size is actually reduced, because I removed a few unused template instances and most of FP16-related instances were already in the binary (unused until this PR).

A quick check with bench-ann shows the FP16 benchmarks seem to work. I've checked that CAGRA is a little bit faster on FP16 vs FP32 on the glove dataset.

achirkin avatar Sep 27 '24 11:09 achirkin

/merge

achirkin avatar Sep 27 '24 19:09 achirkin