beir icon indicating copy to clipboard operation
beir copied to clipboard

Evaluation codes for ColBERT

Open rainatam opened this issue 2 years ago • 11 comments

Hi,

I want to evaluate ColBERT on BEIR but I couldn't find any example related to it.

You mentioned in the paper that it involves dense retrieval and re-ranking with ColBERT. Could you please share the code about that?

Thank you very much!

rainatam avatar Feb 21 '22 01:02 rainatam

@NThakur20 Hi,

I am also looking for these evaluation code. It seems there are some complicated logics here. Could you please provide some hints on when will you release these evaluation code?

Many thanks!

Xiao9905 avatar Mar 01 '22 07:03 Xiao9905

Hi @rainatam and @Xiao9905, I worked on the evaluation scripts and released the ColBERT evaluation code here today: https://github.com/NThakur20/beir-ColBERT.

Please read the README attached to the repository for evaluation, it contains a single script evaluate_beir.sh which does all preprocessing, encoding, indexing, retrieval, and evaluation.

Hope it helps! Let me know in case something is broken or not working.

Kind Regards, Nandan Thakur

thakur-nandan avatar Mar 03 '22 16:03 thakur-nandan

Hi @NThakur20 ,

Thanks for your sharing!

The README and script are very clear and I can run the code successfully. Now I can reproduce your ColBERT results.

Btw, I noticed that the number of Partitions for IVFPQ index is varied for different datasets. Could you please share those parameters as well (if you kept them)?

Again, thank you for your work.

rainatam avatar Mar 04 '22 13:03 rainatam

Hi @NThakur20 thanks for sharing the code it is really easy to test it.

However, there seems that there is a problem on evaluating Arguana on ColBERT (something that I had been having trouble as well, because I could never reproduce the results). The provided code does not remove the query from the document list, which makes it so that ColBERT has NDCG@1 = 0. Fixing it is easy, just update the reading part to:

    #### Results ####
    for _, row in tsv_reader(rankings):
        qid, doc_id, rank = row[0], row[1], int(row[2])
        if qid != inv_map[str(doc_id)]:
            if qid not in results:
                results[qid] = {inv_map[str(doc_id)]: 1 / (rank + 1)}
            else:
                results[qid][inv_map[str(doc_id)]] = 1 / (rank + 1)

This improves the result on arguana ndcg@10 from 0.2985 to 0.4042.

I don't remember if there are other datasets where this could be a problem

cadurosar avatar Mar 04 '22 13:03 cadurosar

Thanks, @cadurosar for this! Yes, this problem would be seen for Arguana and Quora. Will update the necessary code!

thakur-nandan avatar Mar 04 '22 14:03 thakur-nandan

@thakur-nandan , do you have a cpu version of the yml by any chance? or a googlel colab notebook?

yakkanti avatar Mar 27 '23 18:03 yakkanti

@thakur-nandan I can not load the weights of model, do you have any solution?

RuntimeError: NCCL error in: /opt/conda/conda-bld/pytorch_1595629403081/work/torch/lib/c10d/ProcessGroupNCCL.cpp:32, unhandled cuda error, NCCL version 2.4.8

zt991211 avatar Jun 10 '23 09:06 zt991211

@rainatam @Xiao9905 @cadurosar Do you have encountered problems like this?

zt991211 avatar Jun 10 '23 09:06 zt991211

Not at the time, but it has been a while... Do you have more than one gpu on the machine? Maybe trying monogpu into that case? (CUDA_VISIBLE_DEVICES=0 ...)

cadurosar avatar Jun 10 '23 18:06 cadurosar

Not at the time, but it has been a while... Do you have more than one gpu on the machine? Maybe trying monogpu into that case? (CUDA_VISIBLE_DEVICES=0 ...)

Thank you! The script uses torch.distributed.launch. If I use monogpu, does it still works? Should I drop the way of distributed?

zt991211 avatar Jun 11 '23 01:06 zt991211

ubuntu:107847:107847 [0] NCCL INFO Bootstrap : Using [0]ens31f0:192.168.2.107<0> ubuntu:107847:107847 [0] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so). ubuntu:107847:107847 [0] NCCL INFO NET/IB : No device found. ubuntu:107847:107847 [0] NCCL INFO NET/Socket : Using [0]ens31f0:192.168.2.107<0> NCCL version 2.4.8+cuda10.1 ubuntu:107848:107848 [0] NCCL INFO Bootstrap : Using [0]ens31f0:192.168.2.107<0> ubuntu:107848:107848 [0] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so). ubuntu:107848:107848 [0] NCCL INFO NET/IB : No device found. ubuntu:107848:107848 [0] NCCL INFO NET/Socket : Using [0]ens31f0:192.168.2.107<0> ubuntu:107847:107878 [0] NCCL INFO Setting affinity for GPU 0 to 0fffff,ff000000,0fffffff ubuntu:107848:107879 [0] NCCL INFO Setting affinity for GPU 0 to 0fffff,ff000000,0fffffff ubuntu:107847:107878 [0] NCCL INFO Channel 00 : 0 1 ubuntu:107847:107878 [0] NCCL INFO Ring 00 : 0[0] -> 1[0] via P2P/IPC ubuntu:107848:107879 [0] NCCL INFO Ring 00 : 1[0] -> 0[0] via P2P/IPC ubuntu:107847:107878 [0] NCCL INFO Using 256 threads, Min Comp Cap 8, Trees disabled ubuntu:107848:107879 [0] NCCL INFO comm 0x7fab54002730 rank 1 nranks 2 cudaDev 0 nvmlDev 0 - Init COMPLETE

ubuntu:107848:107848 [0] enqueue.cc:197 NCCL WARN Cuda failure 'invalid device function' ubuntu:107848:107848 [0] NCCL INFO misc/group.cc:148 -> 1 ubuntu:107847:107878 [0] NCCL INFO comm 0x7f2b18002730 rank 0 nranks 2 cudaDev 0 nvmlDev 0 - Init COMPLETE ubuntu:107847:107847 [0] NCCL INFO Launch mode Parallel

ubuntu:107847:107847 [0] enqueue.cc:197 NCCL WARN Cuda failure 'invalid device function' ubuntu:107847:107847 [0] NCCL INFO misc/group.cc:148 -> 1 Traceback (most recent call last): File "/data/huangchen/anaconda3/envs/colbert-v0.2/lib/python3.7/runpy.py", line 193, in _run_module_as_main Traceback (most recent call last): File "/data/huangchen/anaconda3/envs/colbert-v0.2/lib/python3.7/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/data/huangchen/anaconda3/envs/colbert-v0.2/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/huangchen/beir-ColBERT/colbert/index.py", line 58, in main() File "/home/huangchen/beir-ColBERT/colbert/index.py", line 25, in main args = parser.parse() File "/home/huangchen/beir-ColBERT/colbert/utils/parser.py", line 110, in parse Run.init(args.rank, args.root, args.experiment, args.run) File "/home/huangchen/beir-ColBERT/colbert/utils/runs.py", line 51, in init distributed.barrier(rank) File "/home/huangchen/beir-ColBERT/colbert/utils/distributed.py", line 25, in barrier torch.distributed.barrier() File "/data/huangchen/anaconda3/envs/colbert-v0.2/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 1710, in barrier "main", mod_spec) File "/data/huangchen/anaconda3/envs/colbert-v0.2/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/huangchen/beir-ColBERT/colbert/index.py", line 58, in main() File "/home/huangchen/beir-ColBERT/colbert/index.py", line 25, in main args = parser.parse()
work = _default_pg.barrier() File "/home/huangchen/beir-ColBERT/colbert/utils/parser.py", line 110, in parse RuntimeError: NCCL error in: /opt/conda/conda-bld/pytorch_1595629403081/work/torch/lib/c10d/ProcessGroupNCCL.cpp:32, unhandled cuda error, NCCL version 2.4.8 Run.init(args.rank, args.root, args.experiment, args.run) File "/home/huangchen/beir-ColBERT/colbert/utils/runs.py", line 51, in init distributed.barrier(rank) File "/home/huangchen/beir-ColBERT/colbert/utils/distributed.py", line 25, in barrier torch.distributed.barrier() File "/data/huangchen/anaconda3/envs/colbert-v0.2/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 1710, in barrier work = _default_pg.barrier() RuntimeError: NCCL error in: /opt/conda/conda-bld/pytorch_1595629403081/work/torch/lib/c10d/ProcessGroupNCCL.cpp:32, unhandled cuda error, NCCL version 2.4.8 Traceback (most recent call last): File "/data/huangchen/anaconda3/envs/colbert-v0.2/lib/python3.7/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/data/huangchen/anaconda3/envs/colbert-v0.2/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/data/huangchen/anaconda3/envs/colbert-v0.2/lib/python3.7/site-packages/torch/distributed/launch.py", line 261, in main() File "/data/huangchen/anaconda3/envs/colbert-v0.2/lib/python3.7/site-packages/torch/distributed/launch.py", line 257, in main cmd=cmd)

zt991211 avatar Jun 11 '23 01:06 zt991211