faiss icon indicating copy to clipboard operation
faiss copied to clipboard

bfloat16 (bf16) support in faiss

Open mlomeli1 opened this issue 1 year ago • 4 comments

Many LLMs are trained with bf16, if we want to use the hidden states of LLMs for retrieval, those vectors will be in bf16 dtype. It would be helpful to support bf16 in Faiss so that we can use LLMs as retriever or embedding model.

mlomeli1 avatar Sep 16 '24 09:09 mlomeli1

Note that we cannot pass through the numpy wrapper because numpy does not support bf16. Adapting gpu_knn code for pytorch should be easy https://github.com/facebookresearch/faiss/blob/dc55e11874e4e32d0f04a9d62156ebe2ca25192d/contrib/torch_utils.py#L497

mdouze avatar Sep 16 '24 11:09 mdouze

  • add bfloat16 support here

https://github.com/facebookresearch/faiss/blob/main/faiss/gpu/GpuDistance.h#L76

mdouze avatar Sep 17 '24 13:09 mdouze

Example of how we currently would use a PQ codec to encode/decode pytorch bf16 tensors:

torch.from_numpy( codec.sa_decode(codec.sa_encode(x.to(device='cpu', dtype=torch.float32).numpy()) )

this piece of code showcases all the cpu moves + up casting + converting to numpy array. Successively, at decoding, we need to convert back to a tensor. Ideally, we could avoid some of these if this was supported for pytorch tensors.

mlomeli1 avatar Oct 18 '24 17:10 mlomeli1

just in case, there is a ScalarQuantizer implementation for bf16, maybe portions of it can be reused

alexanderguzhva avatar Oct 18 '24 21:10 alexanderguzhva