Configure to ignore words that are split by underscores/hyphens

Open sdankel opened this issue 10 months ago • 1 comments

Motivation

I am finding a lot of false positive where the word is split up by _ and -. I get that evasion detection is a feature but for my use case I want to allow evasive words. So far I haven't been able to find a way to configure it as such. Please let me know if there's a way to do so.

For example, I want test_fun to not be censored. I tried:

    let (censored, analysis) = Censor::from_str("test_fun")
        .with_censor_threshold(
            !Type::EVASIVE & RustrictType::MODERATE_OR_HIGHER
        )
        .censor_and_analyze();

But it seems that it's not getting marked as evasive. It censors to tes****n

Proposed solution

It would be nice to have an option to only censor the word if the full word is a profanity, so I can split by _ or whatever I want and pass each word into the profanity filter myself. This would also prevent false positives like Lifshitz.

Something like this:

    let (censored, analysis) = Censor::from_str("test_fun")
        .with_censor_threshold(RustrictType::MODERATE_OR_HIGHER)
        .with_only_full_words(true) // defaults to false
        .censor_and_analyze();

Context

I am using rustrict version 0.7.33

Feb 11 '25 23:02 sdankel

Thanks for the issue!

~~I suggest the following workaround: Replace all _ or - with the space character before filtering.~~

Edit: I see you already thought of this! Please let me know if the workaround doesn't solve the problem.

As for how the filter could be improved, I will try to make the existing false positive detection allow _ or - instead of space.

Mar 05 '25 00:03 finnbear