vector-db-benchmark
vector-db-benchmark copied to clipboard
Support pulling embedding from any Huggingface dataset
Would be nice if we could support pulling embedding from any Huggingface dataset. This would make the project even more useful for external users :)
The spec for this could be like this:
{
"name": "SciPhi/AgentSearch-V1",
"vector_size": 100,
"distance": "cosine",
"type": "huggingface",
"path": "glove-100-angular/glove-100-angular.hdf5",
"link": "https://huggingface.co/datasets/SciPhi/AgentSearch-V1",
"schema": {
"vector_field": "openai",
"payload": {
"url": "text"
}
}
}
Needs some discussion before implementing