Docs on disabling vector disk cache on high throughput scenario
Request
I am suggesting/requesting a section documenting that disabling vector disk cache might bring significant performance boost, if the throughput is particularly high (if I am correct).
Context
Please forgive me not following templates as this doesn't seem to fit into any.
I was trying to achieve a high-throughput embedding service on a powerful gpu instance (30k+ embeddings/sec), but the gpu usage never exceeds even 5% when I'm using a proper batch size (1536 in my case), and it hits 429 very quickly. For some reason the interval between each batch can take up to ~10 seconds.
After many attempts and troubleshooting I realized the vector disk cache might be the bottleneck here. In this case there was no benefits of keeping a cache of embeddings, and it would still be too intensive even if it's in a separate thread. After setting no-vector-disk-cache I was able to achieve 100% gpu usage.