Sean MacAvaney comments

Results 229 comments of


                                            Sean MacAvaney

The masks are all equal to 1

This effects cases in which the batch is larger than 1. With the default settings, this is the case for training (gradient accumulation batch size of 2) and evaluation (batch...

Train cedrpacrr using 5 folds BERT checkpoint, train_pairs and valid_run

Hi @Pourbahman, I recommend using a package like [OpenNIR](https://github.com/Georgetown-IR-Lab/OpenNIR) or [Capreolus](https://github.com/capreolus-ir/capreolus). This repository was meant to be as a simplification/demonstration of the main idea, rather than a comprehensive system for...

CEDR for MARCO document ranking

I don't recall trying it, but in [PARADE](https://arxiv.org/pdf/2008.09093.pdf) we identified some weirdness about the document ranking task that may explain what you're seeing. The dataset has a strong bias towards...

MAP results missing

Thanks for the feedback and interest in this work. I am familiar with both of the papers you cited. For WebTrack, we measure using ERR@20 and nDCG@20. For Robust04, we...

How to get WebTrack 2012-2014 datasets.

Information on obtaining the two ClueWeb collections are found here: - https://lemurproject.org/clueweb09.php/ - https://lemurproject.org/clueweb12/ They are purchased from CMU and sent on hard drives. Unfortunately, they cannot be distributed by...

Question about running extract_docs_from_index.py

We cannot release the dataset directly due to the data usage agreement. However, I could provide a script that builds the file from the [ir-datasets](https://github.com/allenai/ir_datasets/) package, if that would help?...

Question about running extract_docs_from_index.py

The error says that the index was created with a newer Lucene version than the current software supports. I think you should be able to add a codecs JAR to...

Difficulties to reproduce results on Robust 04

Hi Martin! Thanks for reporting. I'm looking into these issues (as well as related #21).

Difficulties to reproduce results on Robust 04

Hi @krasserm -- sorry for the delays. I'm trying to balance a variety of priorities right now, and I have not had much time to dig into this.

Run files of Vanilla BERT checkpoints do not match test folds in data/robust

Hi Marin, Thanks for pointing out this inconsistency! I suspect that it can be explained by a mismatch between the original code used for running the experiments (which reflect the...