annlite
annlite copied to clipboard
Support for 16 bit quantization
Are there any plans for supporting 16 bit floats (fp16 and Bfloat16) quantization for embeddings. I would assume it would be an easier choice that doesn't separately need to train any codebooks and gives some headroom for scaling indexes without compromising on recall quality
That's a good idea. Actually, we are investigating whether bf16 would benefits (and how much) the HNSW search.
AFAIk FP16C allows to convert between fp16 and fp32 should be available on most AVX CPUs. AVX512-fp16 provides the ability to perform maths on fp16 directly, but is only supported by a very small number of the latest CPUs. For BF16 I am not aware if any architecture supports vectorized math yet . So i would assume we would have to convert back and forth , might be increased search latency (if not bottlenecked on memory bandwidth) but saves quite a bit on embedding space with little degradation on recall if any