ProfanityFilter icon indicating copy to clipboard operation
ProfanityFilter copied to clipboard

False positives

Open noodle-t opened this issue 6 years ago • 26 comments

I am seeing a number of words and phrases that are wrongly identified as profane. For example: shoe as sure I have a hole in my pocket album coke scrape

Do you have any ideas how to address these? I'll send you something if I can come up with a solution.

noodle-t avatar Mar 30 '18 15:03 noodle-t

I am getting escaped for word "analytics" as it contains "anal". Is there any solution?

si-deep avatar Mar 14 '19 06:03 si-deep

Yeah this is an interesting problem - I'll need to look into how I've solved this in the past with other words. I'll try to look into this a bit more in the next few days when I have some time as it's been awhile since I looked into this package.

developerdino avatar Mar 19 '19 04:03 developerdino

I'm having the same issue. "Madison Bumgardner" would be flagged as profanity even though it's a simple name. My workaround right now is to load in a different config and remove some of the words that aren't as obvious. Maybe adding settings to skip specific checks could help here. For example, "bum" stand alone, but not in the middle of a word would flag it. If you think it's a good idea I could possibly PR it.

wwwroth avatar Apr 22 '19 17:04 wwwroth

I'm happy for you to create a PR for this idea, not sure how it would work though as there are checks that occur to find profanity with spaces between and other symbols as well. Have a look at the tests to see these in action.

developerdino avatar Apr 23 '19 22:04 developerdino

I am getting escaped for word "analytics" as it contains "anal". Is there any solution?

Some more words on the list.

  1. assignment - > "ass"
  2. document -> "cum"
  3. get it - > "tit"

sanjayhb avatar May 08 '20 09:05 sanjayhb

help!!!

rjworks avatar Jul 10 '20 05:07 rjworks

help!!!

Sure, are you having the same issue?

developerdino avatar Jul 10 '20 09:07 developerdino

help!!!

Sure, are you having the same issue?

yes:((

rjworks avatar Jul 10 '20 16:07 rjworks

like when i say “hows it going” it gets censored

rjworks avatar Jul 11 '20 00:07 rjworks

this profanity is great but this issue is the most annoying plz look at this repo it doesn’t censore unspaced characters https://github.com/jojoee/leo-profanity-php

rjworks avatar Jul 11 '20 00:07 rjworks

OK, I think I'll have to do an upgrade to this package as it seems it is still in use by several projects.

Maybe I'll set up a project for a version 2 of this package.

I'm thinking an improved checker which doesn't catch normal words like document, assignment, etc. Or words in sentences, so no boundary jumping.

Support for PHP versions 7.2+ only as >7.1 is end of life.

Please let me know any other features you would like.

developerdino avatar Jul 11 '20 01:07 developerdino

I've setup a Github Project https://github.com/developerdino/ProfanityFilter/projects/1 to work on features for the new version. Please comment and let me know what you would like within this new version and I'll see if we can get this working better for everyone. I've added all the existing issues in there with a few requirements as well.

developerdino avatar Jul 11 '20 01:07 developerdino

OK, I think I'll have to do an upgrade to this package as it seems it is still in use by several projects.

Maybe I'll set up a project for a version 2 of this package.

I'm thinking an improved checker which doesn't catch normal words like document, assignment, etc. Or words in sentences, so no boundary jumping.

Support for PHP versions 7.2+ only as >7.1 is end of life.

Please let me know any other features you would like.

other features after false positives .. hmm .. I’d say bypasses? like sh8t sh1t etc

rjworks avatar Jul 11 '20 07:07 rjworks

https://github.com/developerdino/ProfanityFilter/blob/develop/src/Check.php#L289

I replaced

$expression = '/' . preg_replace(
                array_keys($characterExpressions),
                array_values($characterExpressions),
                $word
            ) . '/i';

with

$expression = '/\b' . preg_replace(
                array_keys($characterExpressions),
                array_values($characterExpressions),
                $word
            ) . '\b/i';

so it only filters full words or words with seperators, for example "as.s" and not "as sure". Works great in my case.

NetzipK avatar Jul 30 '20 11:07 NetzipK

https://github.com/developerdino/ProfanityFilter/blob/develop/src/Check.php#L289

I replaced

$expression = '/' . preg_replace(
                array_keys($characterExpressions),
                array_values($characterExpressions),
                $word
            ) . '/i';

with

$expression = '/\b' . preg_replace(
                array_keys($characterExpressions),
                array_values($characterExpressions),
                $word
            ) . '\b/i';

so it only filters full words or words with seperators, for example "as.s" and not "as sure". Works great in my case.

Oh wow, I'm really hoping this is a valid option. Would love it if you'd want to make a PR with some tests so you get the credit for this, otherwise I'm happy to integrate and add you as a contributor in the readme.

developerdino avatar Jul 30 '20 14:07 developerdino

very nice!!

rjworks avatar Jul 30 '20 17:07 rjworks

https://github.com/developerdino/ProfanityFilter/blob/develop/src/Check.php#L289

I replaced

$expression = '/' . preg_replace(
                array_keys($characterExpressions),
                array_values($characterExpressions),
                $word
            ) . '/i';

with

$expression = '/\b' . preg_replace(
                array_keys($characterExpressions),
                array_values($characterExpressions),
                $word
            ) . '\b/i';

so it only filters full words or words with seperators, for example "as.s" and not "as sure". Works great in my case.

what if someone do a s s ?

rjworks avatar Jul 30 '20 22:07 rjworks

https://github.com/developerdino/ProfanityFilter/blob/develop/src/Check.php#L289 I replaced

$expression = '/' . preg_replace(
                array_keys($characterExpressions),
                array_values($characterExpressions),
                $word
            ) . '/i';

with

$expression = '/\b' . preg_replace(
                array_keys($characterExpressions),
                array_values($characterExpressions),
                $word
            ) . '\b/i';

so it only filters full words or words with seperators, for example "as.s" and not "as sure". Works great in my case.

what if someone do a s s ?

Still filters it.

NetzipK avatar Jul 31 '20 12:07 NetzipK

Sorry this was closed automatically by the project when I merged the tests to be able to check the change in this PR.

developerdino avatar Aug 01 '20 22:08 developerdino

https://github.com/developerdino/ProfanityFilter/blob/develop/src/Check.php#L289 I replaced

$expression = '/' . preg_replace(
                array_keys($characterExpressions),
                array_values($characterExpressions),
                $word
            ) . '/i';

with

$expression = '/\b' . preg_replace(
                array_keys($characterExpressions),
                array_values($characterExpressions),
                $word
            ) . '\b/i';

so it only filters full words or words with seperators, for example "as.s" and not "as sure". Works great in my case.

what if someone do a s s ?

Still filters it.

this can be bypassed fuckshitass

rjworks avatar Aug 23 '20 06:08 rjworks

I'll add this as a test @rjworks, this is becoming a very complex problem and one I'm not sure we can solve completely with regular expressions. Maybe a whitelist, but not sure how that would work either. I'm happy for any suggestions.

developerdino avatar Aug 23 '20 22:08 developerdino

I see there has been work done to address this issue about 18 months ago. I'm wondering if this project is still alive and if there will be a new release version.

sebastianheise avatar Apr 07 '22 22:04 sebastianheise

for those looking for a quick dirty work around try replacing 'cum' with ' cum' and 'cum ' image

georgeDennington avatar Apr 08 '22 08:04 georgeDennington

@sebastianheise this is my list that seems to get past these while still banning anything that is a problem profanities.txt

this is a slightly more edited list, it will let some stuff through but has less false positives profanities.txt

georgeDennington avatar Apr 08 '22 08:04 georgeDennington

Not dead, but I have limited time to work on this. I’m happy to accept PRs and new features.

The problem with the approach I originally took with this was, where do you check for the word boundary. So I’m this case of “ass” how do you create a regex that accommodates “ass”, “a s s”, “asshole” but not “pass“ or “passport” and all its other potential permutations. I’ve tried a few times to come up with different solutions but nothing that was any better than the currently released version with all its flaws.

I’m happy for any ideas as I’m sure I haven’t explored them all and it seems like developers are getting some value out of this package still.

developerdino avatar Apr 08 '22 22:04 developerdino

Not dead, but I have limited time to work on this. I’m happy to accept PRs and new features.

The problem with the approach I originally took with this was, where do you check for the word boundary. So I’m this case of “ass” how do you create a regex that accommodates “ass”, “a s s”, “asshole” but not “pass“ or “passport” and all its other potential permutations. I’ve tried a few times to come up with different solutions but nothing that was any better than the currently released version with all its flaws.

I’m happy for any ideas as I’m sure I haven’t explored them all and it seems like developers are getting some value out of this package still.

In my solution, I just used a list of white words.

KVinS avatar Dec 23 '22 23:12 KVinS