Jimmy Lin
Jimmy Lin
Can you make the safetensors collection go into `collections/beir-v1.0.0/bge-base-en-v1.5.safetensors/`, alongside the original? So all files should go into `collections/beir-v1.0.0/bge-base-en-v1.5.safetensors/nfcorpus/`. We also shouldn't need a new indexer. The indexing command should...
Looking at this command: ``` bin/run.sh io.anserini.index.SafeTensorsIndexCollection \ -collection JsonDenseVectorCollection \ -input collections/beir-v1.0.0/bge-base-en-v1.5/nfcorpus \ -index indexes/beir-v1.0.0/bge-base-en-v1.5/nfcorpus/ \ -generator HnswJsonWithSafeTensorsDenseVectorDocumentGenerator \ -threads 9 -storePositions -storeDocvectors -storeRaw \ -vectorsDirectory target\safetesnors\vectors \ -docidsDirectory...
> I think you are looking at the older command this is the updated one > > > > > ``` > bin/run.sh io.anserini.index.IndexHnswDenseVectors \ > -collection JsonDenseVectorCollection \ >...
Sorry, I'm confused again: ``` bin/run.sh io.anserini.index.IndexHnswDenseVectors -collection JsonDenseVectorCollection -input collections/beir-v1.0.0/bge-base-en-v1.5/nfcorpus -generator HnswJsonWithSafeTensorsDenseVectorDocumentGenerator -index indexes/beir-v1.0.0/bge-base-en-v1.5/nfcorpus/ -threads 16 -M 16 -efC 100 -memoryBuffer 65536 -noMerge >& logs/log.beir-v1.0.0-nq.bge-base-en-v1.5 & ``` Why would...
I'm not getting your logic, but I think you need to implement two classes: + `SafeTensorsDenseVectorCollection` + `SafeTensorsDenseVectorDocumentGenerator` And your command would be something like `-collection SafeTensorsDenseVectorCollection ... -generator SafeTensorsDenseVectorDocumentGenerator`....
@Panizghi if I'm reading your code correctly, you're assuming that there's only one vector file per directory, right? This is not necessary the case. For example, for `robust04`: ``` $...
@Panizghi on your branch, running: ``` $ python src/main/python/safetensors/json_to_bin.py \ --input collections/beir-v1.0.0/bge-base-en-v1.5/nfcorpus/vectors.part00.jsonl.gz \ --output collections/beir-v1.0.0/bge-base-en-v1.5.safetensors/nfcorpus ``` Works fine. However, I would like some progress indication... e.g., using tqdm? Also, what...
Running indexing command: ``` bin/run.sh io.anserini.index.IndexHnswDenseVectors \ -collection SafeTensorsDenseVectorCollection \ -input collections/beir-v1.0.0/bge-base-en-v1.5.safetensors/nfcorpus \ -generator SafeTensorsDenseVectorDocumentGenerator \ -index indexes/lucene-hnsw.beir-v1.0.0-nfcorpus.bge-base-en-v1.5/ \ -threads 16 -M 16 -efC 100 -memoryBuffer 65536 -noMerge ``` Something's...
Okay, I can now run these commands: ``` python src/main/python/safetensors/json_to_bin.py \ --input collections/beir-v1.0.0/bge-base-en-v1.5/nfcorpus/vectors.part00.jsonl.gz \ --output collections/beir-v1.0.0/bge-base-en-v1.5.safetensors/nfcorpus --overwrite bin/run.sh io.anserini.index.IndexHnswDenseVectors \ -collection SafeTensorsDenseVectorCollection \ -input collections/beir-v1.0.0/bge-base-en-v1.5.safetensors/nfcorpus \ -generator SafeTensorsDenseVectorDocumentGenerator \ -index...
Superseded by #2582