dkpro-core
dkpro-core copied to clipboard
Enhance support for loading/caching corpora/datasets
- [ ] drop
idand base all actions ongroupId/datasetId/version/language/mediaType - [ ] migrate UD dataset to
DatasetFactory- problem here is that the DS is very large and it is tedious to manually create all the description files. In the old approach, we could get the DS information after downloading. But in order to integrate the info into the documentation, we now need it statically. - [ ] automatically augment
Known corporain documentation with integrated datasets during documentation generation - [ ] show list of readers for each corpus / link media type to readers supporting that media type
- [ ] add API to query for dataaset based on its properties, e.g. get all German CoNLL 2006 datasets
- [ ] add information about annotation types that can be obtained from the datasets (e.g. Token, Sentence, POS, etc.)
- [ ] add tagset information
- [ ] consider adding a
rolessection inside the artifacts - [ ] ...?
See also
- Issue #911