ir_datasets icon indicating copy to clipboard operation
ir_datasets copied to clipboard

Provides a common interface to many IR ranking datasets.

Results 99 ir_datasets issues
Sort by recently updated
recently updated
newest added
trafficstars

This is something I was expecting to be quite straightforward (or at least better documented in the API) but it doesn't seem to be. Say I want to gather all...

Hi ! Lately, I tried using the HTML extractor wrapper for clueweb documents. When wrapping directly the corpus docstore, everything works fine but when composing vrappers (dataset -> html extractor...

**Dataset Information:** To appear at SIGIR 2022. Is there a more succinct name for this dataset? **Links to Resources:** - [Paper](https://arxiv.org/pdf/2205.11685.pdf) - [Repo](https://github.com/SIGIR-2022/A-Dataset-for-Sentence-Retrieval-for-Open-Ended-Dialogues) **Dataset ID(s) & supported entities:** TBD **Checklist**...

add-dataset

**Describe the bug** I've stumbled on this before, and it seems like the same issue happens here. `.z` and `.Z` files are not always equivalent, but `TrecDocs` treat them like...

bug

regarding #189 Took the opportunity to improve the tests here for the variety of formats, etc. that TrecDocs may encounter. @ArthurCamara -- mind running `python -m tests.integration.disks45` when using your...

**Dataset Information:** A Chinese question answering dataset. **Links to Resources:** - Repo: https://github.com/baidu/DuReader - Paper: https://arxiv.org/abs/2203.10232 **Dataset ID(s) & supported entities:** - TBD **Checklist** Mark each task once completed. All...

add-dataset

**Dataset Information:** "WANDS is a Wayfair product search relevance dataset." **Links to Resources:** - https://github.com/wayfair/WANDS - https://easychair.org/publications/preprint_download/j2D4 **Dataset ID(s) & supported entities:** - `wands` (docs, queries, qrels) **Checklist** Mark each...

add-dataset

Right now, the `ir-datasets.bib` file is a bit messy, with inconsistencies in the ids/fields/formatting/etc. across records. It's probably best to go with an established source, such as DBLP, the ACL...

documentation

**Dataset Information:** An Urdu test collection. **Links to Resources:** - https://arxiv.org/pdf/2011.00565.pdf **Dataset ID(s) & supported entities:** - `cure` **Checklist** Mark each task once completed. All should be checked prior to...

add-dataset

**Dataset Information:** "The main task for the proposed track is ad-hoc cross-language retrieval. Documents will be drawn from Common Crawl newswire, and will be written in Chinese, Russian, and Persian....

add-dataset