stdlib
stdlib copied to clipboard
[BUG]: nlp-sentencize wrongly splits sentences with multiple punctuation marks
Description
Hello! Not sure if this is the right place, but can't post in the other repo.
Using @stdlib/[email protected]
with phrases like 'HAPPY BIRTHDAY!!!'
will incorrectly return a sentence for every punctuation mark:
console.log(sentencize('HAPPY BIRTHDAY!!!'));
> ['HAPPY BIRTHDAY!', '!', '!']
console.log(sentencize('what??'));
> ['what?', '?']
console.log(sentencize('HOW DARE YOU?!?!'));
> ['HOW DARE YOU?', '!', '?', '!']
The above examples should be considered one sentence each
Weirdly enough it works well with ellipsis and phrases ending in !!!1!!11!!!
and stuff like that. Such as:
console.log(sentencize('Yeah, about that...'));
> ['Yeah, about that...']
console.log(sentencize('OH EM GEE!!!1!!11!one!!1'));
> ['OH EM GEE!!!1!!11!one!!1']
This one is fine.
Cheers!
Related Issues
No response
Questions
No response
Demo
No response
Reproduction
const sentencize = require('@stdlib/nlp-sentencize'); console.log(sentencize('SURPRISE!!!'));
Expected Results
['SURPRISE!!!']
Actual Results
['SURPRISE!', '!', '!']
Version
0.2.2
Environments
Node.js
Browser Version
No response
Node.js / npm Version
v22.9.0
Platform
Windows 11
Checklist
- [x] Read and understood the Code of Conduct.
- [x] Searched for existing issues and pull requests.