react-nlp-annotate icon indicating copy to clipboard operation
react-nlp-annotate copied to clipboard

Different numbers not splitting on space (REGEX bug)

Open pedronogs opened this issue 4 years ago • 3 comments

Issue

I'm using UDT application to annotate documents for NER. I found out that if a number is close to another one, they get annotated together (which completely blocks me from annotating two different types of numbers). I found that this originates from react-nlp-annotate and I think this could be some sort of intended behavior, but maybe there is some kind of fix.

I tried to fix this behavior as shown below.

Example

Before

split_to_fix

After

split_fixed

I opened this Pull Request (first one in my life) to show how I fixed this, but I'm not that familiar with this code, so I gladly accept any suggestions !!!

pedronogs avatar Jan 25 '21 17:01 pedronogs

you'll need to use the custom regex feature

seveibar avatar Feb 04 '21 02:02 seveibar

Screenshot_20210203-212838.png

See the seperatorRegex prop?

seveibar avatar Feb 04 '21 02:02 seveibar

Oh you're using it through the UDT, in that case you'll want to look at the UDT Format's wordSplitRegex https://github.com/UniversalDataTool/udt-format/blob/master/interfaces/text_entity_recognition.md

seveibar avatar Feb 04 '21 02:02 seveibar