opensearch-benchmark icon indicating copy to clipboard operation
opensearch-benchmark copied to clipboard

Support Big-ANN Ground truth data as param in vector search

Open VijayanB opened this issue 1 year ago • 2 comments

Is your feature request related to a problem? Please describe.

BIGANN is one of the popular vector search dataset to measure the performance. However, they follow different format for Base data, Query data and Ground truth. Currently, Vector Search Neighbor param doesn't support Ground truth format. This is different from base data, hence, BigANNVectorDataSet should extend support to read, parse, convert it into neighbors like hdf5 for "bin" extension.

Describe the solution you'd like

Extend BigANN Dataset to support new extension "bin" that can parse ground truth and can be used as input for neighbors data set .

Describe alternatives you've considered

Manually convert it into hdf5 or previously supported format like fbin/u8bin

Additional context

N/A

VijayanB avatar May 01 '24 17:05 VijayanB

This was previously supported only in perf-tool . It was not added in osb since recall was not supported. With recall support, adding this format, will help users to gradually move out of perf-tool and use OSB for all use cases.

VijayanB avatar May 01 '24 17:05 VijayanB

Can we add 1.6 release tag to this issue? Thank you.

VijayanB avatar May 01 '24 17:05 VijayanB