Kefeng Ning issues

Results 4 issues of


                                            Kefeng Ning

How to get the mapping information about doc_id with doc_content.

The retrieve result is in the format of doc ids, How to get the doc contents ? Thanks.

How to set chunk_size

def get_chunksize(self): return min(25_000, 1 + len(self) // Run().nranks) Seems like it is set to 25_000 when the dataset is very large. How to set the parameter chunk_size？ Thanks!

[rank1]:[E ProcessGroupNCCL.cpp:523] [Rank 1] Watchdog caught collective operation timeout: WorkNCCL

Use colbert to index about 20GB data on 4xA800 GPUs, following errors raised: Clustering 111099511 points in 128D to 524288 clusters, redo 1 times, 4 iterations Preprocessing in 8.98 s...

Basic Training (ColBERTv1-style) -> ujson.JSONDecodeError: Expected object or value

seems like triples="/path/to/MSMARCO/triples.train.small.tsv" (qid, pid+, pid-) are not supported for training anymore, The triples = '/path/to/examples.64.json' should be like this. ![image](https://github.com/stanford-futuredata/ColBERT/assets/9105946/805b7aba-a688-4bda-bee4-18da9ad157fb)