vector-db-benchmark icon indicating copy to clipboard operation
vector-db-benchmark copied to clipboard

Support pulling embedding from any Huggingface dataset

Open KShivendu opened this issue 10 months ago • 0 comments

Would be nice if we could support pulling embedding from any Huggingface dataset. This would make the project even more useful for external users :)

The spec for this could be like this:

{
    "name": "SciPhi/AgentSearch-V1",
    "vector_size": 100,
    "distance": "cosine",
    "type": "huggingface",
    "path": "glove-100-angular/glove-100-angular.hdf5",
    "link": "https://huggingface.co/datasets/SciPhi/AgentSearch-V1",
    "schema": {
      "vector_field": "openai",
      "payload": {
        "url": "text"
      }
    }
}

Needs some discussion before implementing

KShivendu avatar Apr 03 '24 15:04 KShivendu