Luyu Gao comments

Results 70 comments of


                                            Luyu Gao

Did you remove punctuations before computing the document score?

The current code does not introduce special treatments to punctuations. With respect to the current evaluation query sets, the queries typically do not include punctuations and therefore having punctuations will...

How to use GPU to retrieve?

The current public retriever implementation uses pytorch API calls, so technically it will take as little as adding a few `.cuda()` calls to make it run on GPU. Optimizing it...

How to use GPU to retrieve?

As I said, optimizing it could take some effort. Some considerations include keeping memory aligned and contiguous. GPU topk efficiency is also tricky. It is also likely to be hardware...

Describe C-COIL approach

I have planned to write a tutorial on that submission but have been distracted by other projects. I hope to get it done as soon as possible, at least by...

Describe C-COIL approach

Hello, thanks for reminding me on this! C-COIL is actually Condenser + COIL and the Condenser paper is no longer under review. https://arxiv.org/abs/2104.08253 We have not done experiments with coCondenser...

Dataset error when encoding document

The error message seems to suggest that the datasets package fails to access the json processing script. There is not enough information on why this happens. I suspect that something's...

pyarrow.lib.ArrowNotImplementedError during training phrase

Please confirm that you are using the specified version `datasets`.

How to resume_from_checkpoint

You should be able to pass your checkpoint directly through the argument `--model_name_or_path`.

Regarding the spans in the contrastive loss calculation

I used random non overlapping sequences. Technically what is desired here is a good passage tokenizer; you may get better performance if you can do a better job separating out...

Regarding the spans in the contrastive loss calculation

The length should be selected to align roughly with text lengths in your actual search task's (rounded according to your accelerator's requirement). For passage retrieval, we used 128.