COIL
COIL copied to clipboard
NAACL2021 - COIL Contextualized Lexical Retriever
ColBERT removed punctuations in document because they think they are useless. I wonder if you removed punctuations when computing overlapping tokens between query and document?
How is [corpus-d2q](https://github.com/luyug/COIL/tree/main/uniCOIL#resource) is prepared? On what p_max_len is castorini/unicoil-d2q-msmarco-passage trained? Can I use p_max_len as 512 and encode using it?
I ran these commands in Google Colab with GPU ``` !wget http://boston.lti.cs.cmu.edu/luyug/coil/msmarco-psg/psg-train.tar.gz !tar xfz psg-train.tar.gz !git clone https://github.com/luyug/COIL !pip install transformers datasets ! cd COIL && python run_marco.py --output_dir model...
Just wondering if you could add somewhere a description of your MS MARCO submission "C-COIL + RoBERTa" from 14/07/2021. Which modifications to COIL have you made and what was the...
Awesome idea and exiting exp result. Still, I am confused about the implement of COIL-full, when doing dense retrieval, can we do ANN search to speed up by using FAISS,...
I notice that C-Coil is at the top of the "MS MARCO Passage Ranking Leaderboard", the results are "0.427 on eval" and "0.443 on dev". But the result in https://github.com/luyug/COIL/tree/main/examples/c-coil...
Hi, thank you for sharing this codes. I tested the latency of COIL using the `retriever-fast.py` with one thread and one shard. Batch size is set to one. The cpu...
I find some of the encoding output-dir is empty because the error happens, while the others are normal and filled with cls&token file. Traceback (most recent call last): File "run_marco.py",...
Thank you for sharing the codes. COIL achieves very impressive retrieval performance. I wonder how to use GPU for retrieval.