Luyu Gao
Luyu Gao
The current code does not introduce special treatments to punctuations. With respect to the current evaluation query sets, the queries typically do not include punctuations and therefore having punctuations will...
The current public retriever implementation uses pytorch API calls, so technically it will take as little as adding a few `.cuda()` calls to make it run on GPU. Optimizing it...
As I said, optimizing it could take some effort. Some considerations include keeping memory aligned and contiguous. GPU topk efficiency is also tricky. It is also likely to be hardware...
I have planned to write a tutorial on that submission but have been distracted by other projects. I hope to get it done as soon as possible, at least by...
Hello, thanks for reminding me on this! C-COIL is actually Condenser + COIL and the Condenser paper is no longer under review. https://arxiv.org/abs/2104.08253 We have not done experiments with coCondenser...
The error message seems to suggest that the datasets package fails to access the json processing script. There is not enough information on why this happens. I suspect that something's...
Please confirm that you are using the specified version `datasets`.
You should be able to pass your checkpoint directly through the argument `--model_name_or_path`.
I used random non overlapping sequences. Technically what is desired here is a good passage tokenizer; you may get better performance if you can do a better job separating out...
The length should be selected to align roughly with text lengths in your actual search task's (rounded according to your accelerator's requirement). For passage retrieval, we used 128.