Docs on disabling vector disk cache on high throughput scenario

Open HankelBao opened this issue 8 months ago • 0 comments

Request

I am suggesting/requesting a section documenting that disabling vector disk cache might bring significant performance boost, if the throughput is particularly high (if I am correct).

Context

Please forgive me not following templates as this doesn't seem to fit into any.

I was trying to achieve a high-throughput embedding service on a powerful gpu instance (30k+ embeddings/sec), but the gpu usage never exceeds even 5% when I'm using a proper batch size (1536 in my case), and it hits 429 very quickly. For some reason the interval between each batch can take up to ~10 seconds.

After many attempts and troubleshooting I realized the vector disk cache might be the bottleneck here. In this case there was no benefits of keeping a cache of embeddings, and it would still be too intensive even if it's in a separate thread. After setting no-vector-disk-cache I was able to achieve 100% gpu usage.

May 05 '25 08:05 HankelBao