ir_datasets icon indicating copy to clipboard operation
ir_datasets copied to clipboard

Provides a common interface to many IR ranking datasets.

Results 99 ir_datasets issues
Sort by recently updated
recently updated
newest added
trafficstars

(i.e., shouldn't rely on default encoding anywhere) wip RE: #151

**Dataset Information:** A rather large dataset in Czech. **Links to Resources:** - Repo: https://github.com/Seznam/DaReCzech - Paper: https://arxiv.org/pdf/2112.01810.pdf **Dataset ID(s) & supported entities:** - `dareczech` (docs) - `dareczech/train` (docs, queries, qrels)...

add-dataset

Are we safe bumping the minimum python version from 3.6 to 3.7? The 3.6 end of life is in just a few weeks. related: #139

**Is your feature request related to a problem? Please describe.** When query document pairs have multiple labels associated with them in their qrels, e.g., relevance and quality, only the relevance...

enhancement

As reported by @searchivarius **Describe the bug** Right now, a user can end up with a faulty b13 subset if they only have the b13 disk and follow the instructions...

bug

Per discussion with @diegoceccarelli, it would be nice if the license information for each dataset was included in the documentation.

documentation

**Describe the proposed change** There's a growing number of integrations. Most recently Datamaestro (see #99)! We should document them, give a little promotion for each one, and provide instructions and/or...

documentation

**Dataset Information:** [Beir](https://github.com/UKPLab/beir/blob/main/README.md) is a suite of benchmarks, intended to be used for testing zero-shot transfer. These would help extend the tool beyond primarily ad-hoc tasks. Their benchmarks perform their...

add-dataset