wtf_wikipedia
wtf_wikipedia copied to clipboard
Issue with parsing sentence with dots.
Hello @spencermountain, found that there is small trouble with parsing sentences with dots. Could you please have a look.
thanks @Patrik-scx good find.
looks like the sentence parser is tripping on the period-dash combo in
'''Dr.-Ing. h.c. F. Porsche AG''', usually shortened
will try to fix this in the next release
cheers
https://github.com/spencermountain/wtf_wikipedia/blob/22b806c62aef2b165119a3343b4ab63183861d3b/src/04-sentence/parse.js#L30
I have traced the issue to this line, but I don't know how to fix it because it is challenging to know which period marks the end of a sentence and denotes an abbreviation.
One solution I thought up was to filter out the point between the '''
punctuation. but this is an assumption and I don't know if we can make it
good point. Yeah, the naiive-pass splits anything with a period, then we stitch things like elipses back together - maybe we should add a dash_reg check here? Up to you!