Kefeng Ning

Results 4 issues of Kefeng Ning

The retrieve result is in the format of doc ids, How to get the doc contents ? Thanks.

def get_chunksize(self): return min(25_000, 1 + len(self) // Run().nranks) Seems like it is set to 25_000 when the dataset is very large. How to set the parameter chunk_size? Thanks!

Use colbert to index about 20GB data on 4xA800 GPUs, following errors raised: Clustering 111099511 points in 128D to 524288 clusters, redo 1 times, 4 iterations Preprocessing in 8.98 s...

seems like triples="/path/to/MSMARCO/triples.train.small.tsv" (qid, pid+, pid-) are not supported for training anymore, The triples = '/path/to/examples.64.json' should be like this. ![image](https://github.com/stanford-futuredata/ColBERT/assets/9105946/805b7aba-a688-4bda-bee4-18da9ad157fb)