Stephen Mayhew
Stephen Mayhew
Just looking at the NER scores for English Ontonotes. The top score is 89.71 for Flair, but I can't find where this number is reported. The original paper doesn't evaluate...
It would be great to see a token embedder for [Flair embeddings](https://github.com/zalandoresearch/flair). They have released an extensive toolkit, including pretrained models, so in theory it could be straightforward to incorporate...
https://github.com/CogComp/cogcomp-nlp/blob/ce0f3a03264d293cd751d7a416bbadd858066496/core-utilities/src/main/java/edu/illinois/cs/cogcomp/core/utilities/StringUtils.java#L60 There is a .trim() call, but this only trims whitespace. If the separator is non-whitespace (e.g. underscore, dash, period, bar, etc.), then trim doesn't work. Also, Apache Commons has...
Moses scripts included a useful lowercasing script. Are there any plans to add this?
Don't slow down with lots of documents. Don't display all documents in the page (pagination).
Let users decide if they want propagation or not.
Automatically detect format.
Example: propagate entities and choose documents with few entities.