jvector icon indicating copy to clipboard operation
jvector copied to clipboard

Bench improvements

Open marianotepper opened this issue 7 months ago • 0 comments

This PR introduces a few ease-of-use tools in jvector-examples.

  1. Bench now loads the list of available datasets from a YAML file. The list is provided in jvector-examples/yaml-examples/datasets.yml.
  2. It creates BenchYAML that allows to read config files with JVector hyperparameters in YAML format.
  3. It creates HelloVectorWorld with a single, clean, and simple example

Here's an example YAML file showing what and how can be specified:

configVersion: 4 # do not change this number unless you know what you are doing

dataset: ada002-100k

construction:
  outDegree: [32, 48, 64, 96, 128]
  efConstruction: [60, 80, 100, 120, 160, 200, 400, 600, 800]
  neighborOverflow: [1.2f, 2.0f]
  addHierarchy: [No, Yes]
compression:
    - type: None
    - type: PQ
      parameters:
        # m: 192 # we can either specify the integer m or the integer mFactor. In this case, m will be set to the data dimensionality divided by mFactor
        # mFactor: 8
        # k: 256 # optional parameter. By default, k=256
        centerData: No
        anisotropicThreshold: -1.0 # optional parameter. By default, anisotropicThreshold=-1 (i.e., no anisotropy)
    - type: PQ
      parameters:
        mFactor: 2
        centerData: No
  reranking:
    - FP
    - NVQ
  useSavedIndexIfExists: Yes

search:
  topKOverquery:
    # the value of topK followed by a list with the overquery rates we want to cover
    10: [1.0, 2.0, 5.0, 10.0]
    100: [1.0, 2.0]
  useSearchPruning: [No, Yes]
  compression:
    - type: None
    - type: PQ
      parameters:
        m: 192
        k: 256 # optional parameter. By default, k=256
        centerData: No
        anisotropicThreshold: -1.0 # optional parameter. By default, anisotropicThreshold=-1 (i.e., no anisotropy)

marianotepper avatar Apr 30 '25 23:04 marianotepper