Manveer Tamber
Manveer Tamber
While building a regression for QA with the DPR Wikipedia 100-word splits corpus, I found that Top-K accuracy might differ in the 4th decimal point depending on the format of...
For https://github.com/castorini/pyserini/issues/370 Controlling for everything else it seems using searcher.batch_doc is slower than using searcher.doc. That is to say, I have found using more than one thread leads to this...
After WikiExtractor and https://github.com/facebookresearch/DrQA/tree/main/scripts/retriever pre-processing is done on the Wikipedia XML dump, the final pre-processing is done in these scripts to generate .tsv files for each corpus variant. Corpus Variants:...