Kefeng Ning
Kefeng Ning
The retrieve result is in the format of doc ids, How to get the doc contents ? Thanks.
def get_chunksize(self): return min(25_000, 1 + len(self) // Run().nranks) Seems like it is set to 25_000 when the dataset is very large. How to set the parameter chunk_size? Thanks!
[rank1]:[E ProcessGroupNCCL.cpp:523] [Rank 1] Watchdog caught collective operation timeout: WorkNCCL
Use colbert to index about 20GB data on 4xA800 GPUs, following errors raised: Clustering 111099511 points in 128D to 524288 clusters, redo 1 times, 4 iterations Preprocessing in 8.98 s...
seems like triples="/path/to/MSMARCO/triples.train.small.tsv" (qid, pid+, pid-) are not supported for training anymore, The triples = '/path/to/examples.64.json' should be like this. 