stdlib icon indicating copy to clipboard operation
stdlib copied to clipboard

[BUG]: nlp-sentencize wrongly splits sentences with multiple punctuation marks

Open Pupix opened this issue 4 months ago • 3 comments

Description

Hello! Not sure if this is the right place, but can't post in the other repo.

Using @stdlib/[email protected] with phrases like 'HAPPY BIRTHDAY!!!' will incorrectly return a sentence for every punctuation mark:

console.log(sentencize('HAPPY BIRTHDAY!!!'));
> ['HAPPY BIRTHDAY!', '!', '!']

console.log(sentencize('what??'));
>  ['what?', '?']

console.log(sentencize('HOW DARE YOU?!?!'));
> ['HOW DARE YOU?', '!', '?', '!']

The above examples should be considered one sentence each

Weirdly enough it works well with ellipsis and phrases ending in !!!1!!11!!! and stuff like that. Such as:


console.log(sentencize('Yeah, about that...'));
> ['Yeah, about that...']

console.log(sentencize('OH EM GEE!!!1!!11!one!!1'));
> ['OH EM GEE!!!1!!11!one!!1']

This one is fine.

Cheers!

Related Issues

No response

Questions

No response

Demo

No response

Reproduction

const sentencize = require('@stdlib/nlp-sentencize'); console.log(sentencize('SURPRISE!!!'));

Expected Results

['SURPRISE!!!']

Actual Results

['SURPRISE!', '!', '!']

Version

0.2.2

Environments

Node.js

Browser Version

No response

Node.js / npm Version

v22.9.0

Platform

Windows 11

Checklist

  • [x] Read and understood the Code of Conduct.
  • [x] Searched for existing issues and pull requests.

Pupix avatar Oct 16 '24 21:10 Pupix