ProfanityFilter
ProfanityFilter copied to clipboard
False positives
I am seeing a number of words and phrases that are wrongly identified as profane. For example: shoe as sure I have a hole in my pocket album coke scrape
Do you have any ideas how to address these? I'll send you something if I can come up with a solution.
I am getting escaped for word "analytics" as it contains "anal". Is there any solution?
Yeah this is an interesting problem - I'll need to look into how I've solved this in the past with other words. I'll try to look into this a bit more in the next few days when I have some time as it's been awhile since I looked into this package.
I'm having the same issue. "Madison Bumgardner" would be flagged as profanity even though it's a simple name. My workaround right now is to load in a different config and remove some of the words that aren't as obvious. Maybe adding settings to skip specific checks could help here. For example, "bum" stand alone, but not in the middle of a word would flag it. If you think it's a good idea I could possibly PR it.
I'm happy for you to create a PR for this idea, not sure how it would work though as there are checks that occur to find profanity with spaces between and other symbols as well. Have a look at the tests to see these in action.
I am getting escaped for word "analytics" as it contains "anal". Is there any solution?
Some more words on the list.
- assignment - > "ass"
- document -> "cum"
- get it - > "tit"
help!!!
help!!!
Sure, are you having the same issue?
help!!!
Sure, are you having the same issue?
yes:((
like when i say “hows it going” it gets censored
this profanity is great but this issue is the most annoying plz look at this repo it doesn’t censore unspaced characters https://github.com/jojoee/leo-profanity-php
OK, I think I'll have to do an upgrade to this package as it seems it is still in use by several projects.
Maybe I'll set up a project for a version 2 of this package.
I'm thinking an improved checker which doesn't catch normal words like document
, assignment
, etc. Or words in sentences, so no boundary jumping.
Support for PHP versions 7.2+ only as >7.1 is end of life.
Please let me know any other features you would like.
I've setup a Github Project https://github.com/developerdino/ProfanityFilter/projects/1 to work on features for the new version. Please comment and let me know what you would like within this new version and I'll see if we can get this working better for everyone. I've added all the existing issues in there with a few requirements as well.
OK, I think I'll have to do an upgrade to this package as it seems it is still in use by several projects.
Maybe I'll set up a project for a version 2 of this package.
I'm thinking an improved checker which doesn't catch normal words like
document
,assignment
, etc. Or words in sentences, so no boundary jumping.Support for PHP versions 7.2+ only as >7.1 is end of life.
Please let me know any other features you would like.
other features after false positives .. hmm .. I’d say bypasses? like sh8t sh1t etc
https://github.com/developerdino/ProfanityFilter/blob/develop/src/Check.php#L289
I replaced
$expression = '/' . preg_replace(
array_keys($characterExpressions),
array_values($characterExpressions),
$word
) . '/i';
with
$expression = '/\b' . preg_replace(
array_keys($characterExpressions),
array_values($characterExpressions),
$word
) . '\b/i';
so it only filters full words or words with seperators, for example "as.s" and not "as sure". Works great in my case.
https://github.com/developerdino/ProfanityFilter/blob/develop/src/Check.php#L289
I replaced
$expression = '/' . preg_replace( array_keys($characterExpressions), array_values($characterExpressions), $word ) . '/i';
with
$expression = '/\b' . preg_replace( array_keys($characterExpressions), array_values($characterExpressions), $word ) . '\b/i';
so it only filters full words or words with seperators, for example "as.s" and not "as sure". Works great in my case.
Oh wow, I'm really hoping this is a valid option. Would love it if you'd want to make a PR with some tests so you get the credit for this, otherwise I'm happy to integrate and add you as a contributor in the readme.
very nice!!
https://github.com/developerdino/ProfanityFilter/blob/develop/src/Check.php#L289
I replaced
$expression = '/' . preg_replace( array_keys($characterExpressions), array_values($characterExpressions), $word ) . '/i';
with
$expression = '/\b' . preg_replace( array_keys($characterExpressions), array_values($characterExpressions), $word ) . '\b/i';
so it only filters full words or words with seperators, for example "as.s" and not "as sure". Works great in my case.
what if someone do a s s ?
https://github.com/developerdino/ProfanityFilter/blob/develop/src/Check.php#L289 I replaced
$expression = '/' . preg_replace( array_keys($characterExpressions), array_values($characterExpressions), $word ) . '/i';
with
$expression = '/\b' . preg_replace( array_keys($characterExpressions), array_values($characterExpressions), $word ) . '\b/i';
so it only filters full words or words with seperators, for example "as.s" and not "as sure". Works great in my case.
what if someone do a s s ?
Still filters it.
Sorry this was closed automatically by the project when I merged the tests to be able to check the change in this PR.
https://github.com/developerdino/ProfanityFilter/blob/develop/src/Check.php#L289 I replaced
$expression = '/' . preg_replace( array_keys($characterExpressions), array_values($characterExpressions), $word ) . '/i';
with
$expression = '/\b' . preg_replace( array_keys($characterExpressions), array_values($characterExpressions), $word ) . '\b/i';
so it only filters full words or words with seperators, for example "as.s" and not "as sure". Works great in my case.
what if someone do a s s ?
Still filters it.
this can be bypassed fuckshitass
I'll add this as a test @rjworks, this is becoming a very complex problem and one I'm not sure we can solve completely with regular expressions. Maybe a whitelist, but not sure how that would work either. I'm happy for any suggestions.
I see there has been work done to address this issue about 18 months ago. I'm wondering if this project is still alive and if there will be a new release version.
for those looking for a quick dirty work around try replacing 'cum' with ' cum' and 'cum '
@sebastianheise this is my list that seems to get past these while still banning anything that is a problem profanities.txt
this is a slightly more edited list, it will let some stuff through but has less false positives profanities.txt
Not dead, but I have limited time to work on this. I’m happy to accept PRs and new features.
The problem with the approach I originally took with this was, where do you check for the word boundary. So I’m this case of “ass” how do you create a regex that accommodates “ass”, “a s s”, “asshole” but not “pass“ or “passport” and all its other potential permutations. I’ve tried a few times to come up with different solutions but nothing that was any better than the currently released version with all its flaws.
I’m happy for any ideas as I’m sure I haven’t explored them all and it seems like developers are getting some value out of this package still.
Not dead, but I have limited time to work on this. I’m happy to accept PRs and new features.
The problem with the approach I originally took with this was, where do you check for the word boundary. So I’m this case of “ass” how do you create a regex that accommodates “ass”, “a s s”, “asshole” but not “pass“ or “passport” and all its other potential permutations. I’ve tried a few times to come up with different solutions but nothing that was any better than the currently released version with all its flaws.
I’m happy for any ideas as I’m sure I haven’t explored them all and it seems like developers are getting some value out of this package still.
In my solution, I just used a list of white words.