VectorSimilarity icon indicating copy to clipboard operation
VectorSimilarity copied to clipboard

[MOD-6738] IndexComputer

Open meiravgri opened this issue 1 year ago • 1 comments

TODO::: diable flow temp!!!

The computer

In this PR introduces a new component of the VecSimIndexAbstract: The computer The computer is responsible to:

  1. Process blobs before storing them
  2. Process blobs for graph searches
  3. calculate distance between a query vector and a stored vector
  4. store all the data required to perform the above operations.

The computer has sub componenets:

  1. preprocessor(s)
  2. distance calcualtor

The computer is initilized in the index factory and sub components can be ad-hoc added. IndexComputerBasic have no preprocessors and will only align query blob if needed.

Preprocessor

The preprocessor API supports processing blobs for storage, for query (graph search) or both. IndexComputerExtended holds an array of preprocessors, responsible for diffrent processing. Currently, we only have CosinePreprocessor that normalizes vector if the index type is Cosine. NOTE: in tiered index, we assume that the vectors are processed before inserted to the backend index, so the frontend index will be of type VecSimMetric_Cosine, but internally doesn't hold a cosine preprocessor.

The processed blobs have a cope lifttime and will be released automatically. It is assumed that they are copied if their life time needs to be extended (for storage purposes for example).

Distance Calculator

The distance calculator is defined according to the distance function signature. It holds the disatnce function of the abstract index. The distance calculation API of all Distance Calculator classes is: calc_dist(v1,v2,dim), but internally they will call the distance function according the template signature.

Index API changes

An index of type VecSimIndexAbstract is responsible for preprocessing a blob before performing any operation. It includes adding a new vector and processing a query before searching in the index. All *Wrapper functions were removed. As for the tiered index, it is assumed that the backend index receives a blob that was preprocessed by the frontend index. The backend index can perform additional preprocessing if needed.

meiravgri avatar Aug 29 '24 10:08 meiravgri

Codecov Report

Attention: Patch coverage is 97.56757% with 9 lines in your changes missing coverage. Please review.

Project coverage is 97.02%. Comparing base (f08c051) to head (04ca716). Report is 6 commits behind head on main.

Files with missing lines Patch % Lines
src/VecSim/spaces/computer/preprocessors.h 89.18% 4 Missing :warning:
...rc/VecSim/spaces/computer/preprocessor_container.h 96.05% 3 Missing :warning:
src/VecSim/index_factories/brute_force_factory.cpp 96.55% 1 Missing :warning:
src/VecSim/index_factories/hnsw_factory.cpp 97.61% 1 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #535      +/-   ##
==========================================
- Coverage   97.16%   97.02%   -0.14%     
==========================================
  Files          94      100       +6     
  Lines        4862     5307     +445     
==========================================
+ Hits         4724     5149     +425     
- Misses        138      158      +20     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Aug 29 '24 12:08 codecov[bot]