BLINK
BLINK copied to clipboard
No entity detection for lowercased entities?
How do I use BLINK for lowercased entities?
@Zoher15 Hi, BLINK uses FLAIR for entity detection and it's not working very well for lowercased entities. Do you have cased data?
@ledw So my data is not always well cased (as is data on Twitter etc). There was a paper about this from Dan Roth's group at UPenn: link. This might become a limitation for the amazing tool that BLINK looks to be
@ledw Are there any plans to fix this? I know ELQ handles lowercase, does but it is limited to only 512 tokens. Also I would train it myself by converting a portion of the training entities to lowercase using the methodology in this paper, but the training data is really resource intensive.
Since ELQ handles lower-case very accurately, why not just split your documents into 512 token or less chunks? This isn't a problem where you need the entire document to make a decision. The entities are resolved using a much smaller local context window anyway, so I can't imagine you'd lose much accuracy. There might be a small accuracy hit for entities whose required resolution context is in a different chunk, but I would think that'd be a rare edge case.