vector-db-benchmark issues

Need a faster way to visualize the data

5

We have https://github.com/qdrant/vector-db-benchmark/blob/master/scripts/process-benchmarks.ipynb but it only prepares the data. So web based interactive graphs would be nice. One can use plotly or dash framework. Please use [benchmarks.js](https://github.com/qdrant/landing_page/blob/master/qdrant-landing%2Fthemes%2Fqdrant%2Fstatic%2Fjs%2Fbenchmarks.js) as a reference....

KShivendu

enhancement

good first issue

Standardize all `-default` configs and add `-debug` with parallel = 1 for easy debugging.

This issue covers two tasks: - Many users try `*-default` as their starting point but default config is somewhat different across all the engines. So we should make it same...

KShivendu

good first issue

Add support to wait for server to start in all the engines

Some engines like ElasticSearch and OpenSearch take relatively longer to boot. It would be nice to have the wait feature in-built in the benchmarking script. Note: It's a low priority...

KShivendu

enhancement

nice-to-have

Support pulling embedding from any Huggingface dataset

Would be nice if we could support pulling embedding from any Huggingface dataset. This would make the project even more useful for external users :) The spec for this could...

KShivendu

enhancement

General improvements: extended exposed latency metrics and allowed for filtering benchmark runs by client count.

This PR adds small changes that we're already using: - Adds a progress detail on the downloaded files - Allows to filter by a specific client count on the search...

filipecosta90

Elastic vector limit should be 4096 instead of 2048

1

From v8.10 to v8.12 the dense vector limit move from 2048 to 4096. The benchmark should adjust it accordingly in https://github.com/qdrant/vector-db-benchmark/blob/5b9bffbe7fecff24b8885650049b9e1fdc798f00/engine/clients/elasticsearch/configure.py#L53 Further docs: https://www.elastic.co/guide/en/elasticsearch/reference/current/dense-vector.html#index-vectors-knn-search old v8.10 reference: https://www.elastic.co/guide/en/elasticsearch/reference/8.10/dense-vector.html#dense-vector-params new v8.12...

filipecosta90

Elastic client timeout should be configurable.

Here's a sample traceback for 504 Gateway Timeout server error's on elastic client when the config/vector size leads to longer merge operations. #103 adds a way of fixing/avoiding this issue....

filipecosta90

`max_optimization_threads: 0` doesn't disable indexing

2

While running `vector-db-benchmarks` I've noticed that - we are updating the collection with `max_optimization_threads: 0` [before uploading points][max-optimization-threads-0] - and then once again with `max_optimization_threads: 1` [after upload is finished][max-optimization-threads-1]...

ffuugoo

backoff strategy should be used for rate-limited errors on milvus or reducing batch_size config

4

It's recurrent to see the following type of errors on non-local setups: ``` pymilvus.exceptions.MilvusException: ``` Full traceback: ``` Traceback (most recent call last): File "/usr/lib/python3.10/multiprocessing/pool.py", line 125, in worker result...

filipecosta90

Use `delete_client` wherever required

1

We recently introduced `delete_client` in the base client classes for adding pgvector in #91. We need to check if there are other places where this can help. e.g. replace closable...

KShivendu

vector-db-benchmark
vector-db-benchmark copied to clipboard

Metadata

Need a faster way to visualize the data

Standardize all `-default` configs and add `-debug` with parallel = 1 for easy debugging.

Add support to wait for server to start in all the engines

Support pulling embedding from any Huggingface dataset

General improvements: extended exposed latency metrics and allowed for filtering benchmark runs by client count.

Elastic vector limit should be 4096 instead of 2048

Elastic client timeout should be configurable.

`max_optimization_threads: 0` doesn't disable indexing

backoff strategy should be used for rate-limited errors on milvus or reducing batch_size config

Use `delete_client` wherever required

← Metadata

Owner

Metadata

vector-db-benchmark vector-db-benchmark copied to clipboard

Metadata

← Metadata

Owner

Metadata

vector-db-benchmark
vector-db-benchmark copied to clipboard