Configure to ignore words that are split by underscores/hyphens
Motivation
I am finding a lot of false positive where the word is split up by _ and -. I get that evasion detection is a feature but for my use case I want to allow evasive words. So far I haven't been able to find a way to configure it as such. Please let me know if there's a way to do so.
For example, I want test_fun to not be censored. I tried:
let (censored, analysis) = Censor::from_str("test_fun")
.with_censor_threshold(
!Type::EVASIVE & RustrictType::MODERATE_OR_HIGHER
)
.censor_and_analyze();
But it seems that it's not getting marked as evasive. It censors to tes****n
Proposed solution
It would be nice to have an option to only censor the word if the full word is a profanity, so I can split by _ or whatever I want and pass each word into the profanity filter myself. This would also prevent false positives like Lifshitz.
Something like this:
let (censored, analysis) = Censor::from_str("test_fun")
.with_censor_threshold(RustrictType::MODERATE_OR_HIGHER)
.with_only_full_words(true) // defaults to false
.censor_and_analyze();
Context
I am using rustrict version 0.7.33
Thanks for the issue!
~~I suggest the following workaround: Replace all _ or - with the space character before filtering.~~
Edit: I see you already thought of this! Please let me know if the workaround doesn't solve the problem.
As for how the filter could be improved, I will try to make the existing false positive detection allow _ or - instead of space.