obscenity Bug: `collapseDuplicatesTransformer` does not collapse the last letter

Bug: `collapseDuplicatesTransformer` does not collapse the last letter

Open rion18 opened this issue 6 months ago • 0 comments

Expected behavior

Using obscenity to censor a string containing repeating characters such as pppiiittt and a dataset that contains the word pit.

Using:

collapseDuplicatesTransformer({
  defaultThreshold: 1,
}),

I would expect the whole pppiiittt word to be matched.

Actual behavior

Instead, only the first t is detected, matching pppiiit. The final two t are "not a part of the profanity", while they should be.

Minimal reproducible example

const {
  englishDataset,
  parseRawPattern,
  DataSet,
  RegExpMatcher,
  collapseDuplicatesTransformer,
} = require('obscenity');

const data = new DataSet()
    .addAll(englishDataset)
    .addPhrase(phrase => 
      phrase
        .setMetadata({ originalWord: 'pit' })
        .addPattern(parseRawPattern('pit'))
    ).build();

const transformers = {
  blacklistMatcherTransformers: [
    collapseDuplicatesTransformer({
      defaultThreshold: 1,
    }),
  ],
  whitelistMatcherTransformers: [],
};

const matcher = new RegExpMatcher({
    ...profanityDataset,
    ...transformers,
  });

const stringPit = 'ppiitt';
if (matcher.hasMatch(stringPit)) {
  const matches = matcher.getAllMatches(stringPit, true);
  return textCensor.applyTo(stringPit, matches);
}
return stringPit;

Steps to reproduce

No response

Additional context

No response

Node.js version

18.17.1

Obscenity version

0.4.0

Priority

[X] Low
[ ] Medium
[ ] High

Terms

[X] I agree to follow the project's Code of Conduct.
[X] I have searched existing issues for similar reports.

Aug 28 '24 23:08 rion18

obscenity obscenity copied to clipboard

Bug: `collapseDuplicatesTransformer` does not collapse the last letter

Expected behavior

Actual behavior

Minimal reproducible example

Steps to reproduce

Additional context

Node.js version

Obscenity version

Priority

Terms

obscenity
obscenity copied to clipboard