Sean MacAvaney

Results 224 comments of Sean MacAvaney

`wapo` collection added for #51

Related: #90

2021 uses the C4 corpus. Details here: > AIMS AND SCOPE > -------------------------- > Misinformation represents a key problem when using search engines to guide any decision-making task: Are users...

Added 2021. The [2020 corpus](https://trec-health-misinfo.github.io/2020.html) (CC News) will be a bit of a pain to add. I wonder if there's overlap with `CC-News-En` (#63)?

Hi @isspek, You'll want to use [`c4/en-noclean-tr`](https://ir-datasets.com/master/c4.html#c4/en-noclean-tr) for the document corpus; it's the particular split of C4 used for this track. You could also use [`c4/en-noclean-tr/trec-misinfo-2021`](https://ir-datasets.com/master/c4.html#c4/en-noclean-tr/trec-misinfo-2021), which bundles together both...

Note that it takes some time to download the source files. If you already have them, you can link the directory to `~/.ir_datasets/c4/en.noclean/`

Yup- you can just run this: ``` !pip install git+https://github.com/allenai/ir_datasets.git ``` I can add this to the readme. But you'll probably have issues with the c4 datasets on colab --...

Should this be merged with car-v1? Such as car/v1/... car/v2/...?

CLEF eHealth 2016-17 is handled now (it used the CW12b13 collection). 2018 onwards has a new document collection that provides some challenges. - It's hosted on Dropbox, which has caused...

Correction: Box is the annoying one. Dropbox is easy, you just need to set `dl=1`. So this is a direct download link for the collection: https://www.dropbox.com/s/ixnqt33u5xeelth/clef2018collection.tar.gz?dl=1