Sean MacAvaney

Results 224 comments of Sean MacAvaney

Partially addressing this in #103. Download/stream objects will have a `hard` parameter, which allows you to do a "soft" request for the path that doesn't trigger downloads, etc (defaults to...

Would also be nice to provide a guide for using it on Google Colab -- especially in settings where you want the data to persist. Should be as simple as...

Maybe we could also provide a drive that contains the public datasets that people could mount themselves, to avoid downloading?

Partially addressing this in #103. Will give a message like this one: ``` [INFO] If you have a local copy of https://ai2-s2-research-public.s3-us-west-2.amazonaws.com/ir-datasets/c4/en.noclean.checkpoints.tar.gz, you can symlink it to avoid downloading it...

I'm not very experienced in Windows. But I think you can make the missing directory by: ``` mkdir C:\Users\USER\.ir_datasets\downloads ``` And the download, you can use CURL, I think: ```...

Very cool Joel! I don't have the time to add this straight away, but I'll record some of my thoughts for future reference and/or discussion. **Dataset IDs** I was tempted...

> What do the checkpoints look like, and how large are they? documentation on the checkpoint files are found here https://github.com/allenai/ir_datasets/blob/master/docs/clueweb_warc_checkpoints.md For ClueWeb, they ended up being around 0.1% the...

Great- thanks Nandan!

Thanks! My 2cents: If you're planning to eventually go with the new qrels, I'd suggest adding them as a separate benchmark (`webis-touche2020-v2`?), rather than doing a wholesale replacement. Although the...