Recommendations for label-less annotations and feature-less layers
Currently, the recommenders only learn from annotations that have features and these features must have values.
Sometimes, I just want to quickly make some annotations, e.g. quickly mark named entities, and then later I would assign the feature values. However, while marking entities, I would like to already see recommendations for these entities, e.g. if I mark "Macron" as an entity (without assigning an identifier or value), I still would like to see the next mention of "Macron" highlighted as being a named entity.
- [x] General architectural changes
- [x] StringMatchingRecommender
- [ ] OpenNLP Doccat Recommender (sentence-level)
- [ ] OpenNLP POS Recommender
- [ ] OpenNLP NER Recommender
- [ ] External recommender
- [ ] Add info about label-less recommendations to user documentation
- [ ] Ability to create recommender for layers without any features at all (consider moving this to a separate issue since it may incur systemic changes)
I have two ideas on this:
- add an additional default label to be learned when no label is given for an annotation -> Cons: will affect every classifier which might not be desirable and is not transparent to users and devs of classifiers
- add an option for a "Quickstart Recommender" which only learns two classes: "annotated" or "not annotated" and automatically disables itself after e.g. enough labels were annotated -> Cons: the user will have to learn about this
What do you think @reckart ,@jcklie ?
I believe an additional label for the case annotated-with-no-label can be introduced without affecting all classifiers - i.e. it can be local to a specific classifier. E.g. the OpenNlpPosRecommender uses an internal label <PAD> for tokens which do not have an annotation and this label does not "leak" out of the recommender implementation.
I believe adding a recommender which just learns annotated vs. not annotated would also work.
It may also be possible to combine the ideas such that we in fact change all classifiers in such a way that it can be chosen whether they learn on the value of a specific features or whether they just learn on the presence of an annotation on a given layer - basically by making the feature field in the recommender optional.
@jcklie What do you think, in particular with respect to EL which consists of three parts: 1) identifying mentions (annotation vs. no annotation), 2) classifying mention (e.g. PER, LOC, etc.) 3) linking mention (to some KB)
For me, mention candidates can be either seen as NER or as annotations without linked entity. We do not need the actual NER tag for EL right now, so one could divide NER and EL layers and use this kind of recommender for the mention detection. I agree that we can change the code of some recommenders to include a internal UNKN label, annotations which have no label then use this label instead of being filtered out.
With #1019 have this working for the String matching recommender. I added a checklist to the issue description to track progress for the other recommenders.