ir_datasets icon indicating copy to clipboard operation
ir_datasets copied to clipboard

Datamaestro integration

Open bpiwowar opened this issue 4 years ago • 2 comments

Is your feature request related to a problem? Please describe.

Integration within datamaestro

Describe the solution you'd like

I am working on integrating ir-datasets into datamestro so that querying available datasets is more standardized (JSON generation, etc.), which in turns provides a way to automate indexing and retrieval (see e.g. retrieval with experimaestro-ir).

It would also allow to consider dataset management within ir-datasets (cleanup, documentation and maybe more when datamaestro matures)

Describe alternatives you've considered

None

Additional context

At the moment, I am coding within experimaestro-ir but would be glad to move the code to irds and modifying datamaestro so that it is more generic (abstracting away dataset access). If moving to ir_datasets, the code will be isolated so that it only is triggered when datamaestro is installed and used.

bpiwowar avatar Jul 16 '21 08:07 bpiwowar

Awesome! Let me know if there are changes in ir_datasets that could help facilitate this.

You can access the documentation for a given dataset via dataset.documentation(), which returns a dict. Every dataset has a 'desc' as HTML. There's other structured information too (e.g., "bibtex_ids", which points to records in ir_datasets.bib and official_measures, which points to measure names from ir_measures), but these fields are not always present.

seanmacavaney avatar Jul 16 '21 09:07 seanmacavaney

I will submit patch requests when needed.

I am already integrating ir_measures into experimaestro-ir, I have to think about how to make a full bridge.

bpiwowar avatar Jul 16 '21 09:07 bpiwowar