react-nlp-annotate
react-nlp-annotate copied to clipboard
Different numbers not splitting on space (REGEX bug)
Issue
I'm using UDT application to annotate documents for NER. I found out that if a number is close to another one, they get annotated together (which completely blocks me from annotating two different types of numbers). I found that this originates from react-nlp-annotate and I think this could be some sort of intended behavior, but maybe there is some kind of fix.
I tried to fix this behavior as shown below.
Example
Before
After
I opened this Pull Request (first one in my life) to show how I fixed this, but I'm not that familiar with this code, so I gladly accept any suggestions !!!
you'll need to use the custom regex feature
See the seperatorRegex prop?
Oh you're using it through the UDT, in that case you'll want to look at the UDT Format's wordSplitRegex https://github.com/UniversalDataTool/udt-format/blob/master/interfaces/text_entity_recognition.md