ir_datasets
ir_datasets copied to clipboard
TREC 2024 Tip-of-the-Tongue
Dataset Information:
The training and dev data of the TREC 2023 Tip-of-the-Tongue track are now available: https://trec-tot.github.io/guidelines
Description from the website:
Tip of the tongue: The phenomenon of failing to recall something from memory, combined with partial recall and the feeling that recall is imminent.
Links to Resources:
- Test queries: https://zenodo.org/records/13370657/files/test-2024.zip?download=1
- Corpus: https://zenodo.org/records/11185090/files/corpus.jsonl.zip?download=1
Dataset ID(s) & supported entities:
tip-of-the-tongue/2024: corpustip-of-the-tongue/2024/test: test queries
Checklist
Mark each task once completed. All should be checked prior to merging a new dataset.
- [ ] Dataset definition (in
ir_datasets/datasets/[topid].py) - [ ] Tests (in
tests/integration/[topid].py) - [ ] Metadata generated (using
ir_datasets generate_metadatacommand, should appear inir_datasets/etc/metadata.json) - [ ] Documentation (in
ir_datasets/etc/[topid].yaml)- [ ] Documentation generated in https://github.com/seanmacavaney/ir-datasets.com/
- [ ] Downloadable content (in
ir_datasets/etc/downloads.json)- [ ] Download verification action (in
.github/workflows/verify_downloads.yml). Only one needed pertopid. - [ ] Any small public files from NIST (or other potentially troublesome files) mirrored in https://github.com/seanmacavaney/irds-mirror/. Mirrored status properly reflected in
downloads.json.
- [ ] Download verification action (in
Additional comments/concerns/ideas/etc.
I think this should be rather fast, I think it should be easy to integrate this into the code of the previous year: https://github.com/allenai/ir_datasets/blob/master/ir_datasets/datasets/trec_tot.py
I will try to make a pull request :)
I have created a pull request with some tests here: https://github.com/allenai/ir_datasets/pull/272
As soon as this is merged, we could close the issue :)
fixed with #272, sorry on the delay!