COIL issues

Did you remove punctuations before computing the document score?

3

ColBERT removed punctuations in document because they think they are useless. I wonder if you removed punctuations when computing overlapping tokens between query and document?

namespace-Pt

How do I load the model saved using unicoil training script using pyserini UnicoilDocumentEncoder ?

nirmal2k

How is document expansion helpful if p_max_len=192 in unicoil training and encoding command? Most MSMARCO passages are over 192 tokens

1

How is [corpus-d2q](https://github.com/luyug/COIL/tree/main/uniCOIL#resource) is prepared? On what p_max_len is castorini/unicoil-d2q-msmarco-passage trained? Can I use p_max_len as 512 and encode using it?

nirmal2k

pyarrow.lib.ArrowNotImplementedError during training phrase

1

I ran these commands in Google Colab with GPU ``` !wget http://boston.lti.cs.cmu.edu/luyug/coil/msmarco-psg/psg-train.tar.gz !tar xfz psg-train.tar.gz !git clone https://github.com/luyug/COIL !pip install transformers datasets ! cd COIL && python run_marco.py --output_dir model...

udaygoyat45

Describe C-COIL approach

5

Just wondering if you could add somewhere a description of your MS MARCO submission "C-COIL + RoBERTa" from 14/07/2021. Which modifications to COIL have you made and what was the...

joshdevins

Question about COIL-full

1

Awesome idea and exiting exp result. Still, I am confused about the implement of COIL-full, when doing dense retrieval, can we do ANN search to speed up by using FAISS,...

kinglai