ir_datasets
ir_datasets copied to clipboard
Provides a common interface to many IR ranking datasets.
(i.e., shouldn't rely on default encoding anywhere) wip RE: #151
**Dataset Information:** A rather large dataset in Czech. **Links to Resources:** - Repo: https://github.com/Seznam/DaReCzech - Paper: https://arxiv.org/pdf/2112.01810.pdf **Dataset ID(s) & supported entities:** - `dareczech` (docs) - `dareczech/train` (docs, queries, qrels)...
Are we safe bumping the minimum python version from 3.6 to 3.7? The 3.6 end of life is in just a few weeks. related: #139
fixes #140
**Is your feature request related to a problem? Please describe.** When query document pairs have multiple labels associated with them in their qrels, e.g., relevance and quality, only the relevance...
As reported by @searchivarius **Describe the bug** Right now, a user can end up with a faulty b13 subset if they only have the b13 disk and follow the instructions...
Per discussion with @diegoceccarelli, it would be nice if the license information for each dataset was included in the documentation.
**Describe the proposed change** There's a growing number of integrations. Most recently Datamaestro (see #99)! We should document them, give a little promotion for each one, and provide instructions and/or...
**Dataset Information:** [Beir](https://github.com/UKPLab/beir/blob/main/README.md) is a suite of benchmarks, intended to be used for testing zero-shot transfer. These would help extend the tool beyond primarily ad-hoc tasks. Their benchmarks perform their...