obscenity
obscenity copied to clipboard
Bug: `collapseDuplicatesTransformer` does not collapse the last letter
Expected behavior
Using obscenity
to censor a string containing repeating characters such as pppiiittt
and a dataset that contains the word pit
.
Using:
collapseDuplicatesTransformer({
defaultThreshold: 1,
}),
I would expect the whole pppiiittt
word to be matched.
Actual behavior
Instead, only the first t
is detected, matching pppiiit
. The final two t
are "not a part of the profanity", while they should be.
Minimal reproducible example
const {
englishDataset,
parseRawPattern,
DataSet,
RegExpMatcher,
collapseDuplicatesTransformer,
} = require('obscenity');
const data = new DataSet()
.addAll(englishDataset)
.addPhrase(phrase =>
phrase
.setMetadata({ originalWord: 'pit' })
.addPattern(parseRawPattern('pit'))
).build();
const transformers = {
blacklistMatcherTransformers: [
collapseDuplicatesTransformer({
defaultThreshold: 1,
}),
],
whitelistMatcherTransformers: [],
};
const matcher = new RegExpMatcher({
...profanityDataset,
...transformers,
});
const stringPit = 'ppiitt';
if (matcher.hasMatch(stringPit)) {
const matches = matcher.getAllMatches(stringPit, true);
return textCensor.applyTo(stringPit, matches);
}
return stringPit;
Steps to reproduce
No response
Additional context
No response
Node.js version
18.17.1
Obscenity version
0.4.0
Priority
- [X] Low
- [ ] Medium
- [ ] High
Terms
- [X] I agree to follow the project's Code of Conduct.
- [X] I have searched existing issues for similar reports.