vector-db-benchmark
vector-db-benchmark copied to clipboard
Support for MongoDB Vector Search
MongoDB vector search should reach the community edition later this year
Following up on Mongo announcements (link) from Mongo's CEO:
So, I'm excited to share that we will be introducing full-text search and vector search in MongoDB Community Edition later this year
We can already prepare this benchmark for supporting MongoDB vector search.
Reasoning behind the configuration file
Currently, MongoDB Atlas Vector Search does not offer a way to configure EF_CONSTRUCT and M during index creation, and the only configurable option that could improve precision is exposed during runtime via the numCandidates config (i.e. EF_RUNTIME).
This design choice simplifies the user experience but limits customization and the quality of the replies for use-cases that require higher precision. Therefore, and to be as fair as possible, we're following Mongo’s official documentation:
"We recommend that you specify a number higher than the number of documents to return (limit) to increase accuracy although this might impact latency. For example, we recommend a ratio of ten to twenty nearest neighbors for a limit of only one document."
and using a numCandidates config that reaches 20x the expected reply limit (meaning our max EF_RUNTIME is 2000)
Reasoning behind cosine and dotProduct similarities de-normalization
In MongoDB Atlas, for cosine and dotProduct similarities [check link], the normalization of the score is done using the following formula:
score = (1 + cosine/dot_product(v1,v2)) / 2
To ensure we're able to calculate the expected precision we revert the score normalization back before checking the scores.