wtf_wikipedia icon indicating copy to clipboard operation
wtf_wikipedia copied to clipboard

Issue with parsing sentence with dots.

Open ivan-kuzma-scx opened this issue 4 years ago • 4 comments

Porsche

ivan-kuzma-scx avatar Oct 29 '20 08:10 ivan-kuzma-scx

Hello @spencermountain, found that there is small trouble with parsing sentences with dots. Could you please have a look.

ivan-kuzma-scx avatar Oct 29 '20 09:10 ivan-kuzma-scx

thanks @Patrik-scx good find. looks like the sentence parser is tripping on the period-dash combo in '''Dr.-Ing. h.c. F. Porsche AG''', usually shortened will try to fix this in the next release cheers

spencermountain avatar Oct 29 '20 09:10 spencermountain

https://github.com/spencermountain/wtf_wikipedia/blob/22b806c62aef2b165119a3343b4ab63183861d3b/src/04-sentence/parse.js#L30

I have traced the issue to this line, but I don't know how to fix it because it is challenging to know which period marks the end of a sentence and denotes an abbreviation.

One solution I thought up was to filter out the point between the ''' punctuation. but this is an assumption and I don't know if we can make it

wvanderp avatar Oct 17 '21 14:10 wvanderp

good point. Yeah, the naiive-pass splits anything with a period, then we stitch things like elipses back together - maybe we should add a dash_reg check here? Up to you!

spencermountain avatar Oct 18 '21 13:10 spencermountain