rustrict icon indicating copy to clipboard operation
rustrict copied to clipboard

Long numbers trigger false positives

Open mrxiaozhuox opened this issue 10 months ago • 2 comments

[:floppy-disk: Notes.pdf](https://xxx.xxx.cdn.digitaloceanspaces.com/shared/attachment/adbb9c05-e965-43ac-8dc5-396455e82ead.pdf)



[:memo: 经济学Notes.md](https://xxx.xxx.cdn.digitaloceanspaces.com/shared/attachment/9bc15510-2b02-493b-895a-148647523138.md)



[:film-projector: a8dab022-ebcd-49a1-97c0-0b172422e568.mp4](https://xxx.xxx.cdn.digitaloceanspaces.com/shared/attachment/1c1c55b1-37e0-4603-aea7-47d622b0401f.mp4)

When I tried to validate this URL, it failed validation, mainly due to the number: 148647523138.

I would like to know if there is a way to avoid this, or if you can handle the URL more leniently.

mrxiaozhuox avatar Mar 17 '25 05:03 mrxiaozhuox

Thanks for the issue! The problem is that the censoring part of rustrict has a tendency to see profanity within numbers. Each digit may be replaced with one of several possible similar-looking letters, which often leads to at least one profanity detection. This affects all numbers, including those within URL's (which rustrict doesn't detect).

I've been aware of this issue for months and haven't found a fix yet.

I recommend only passing plain-text to rustrict, not Markdown. You could parse the Markdown to an AST and censor each text node individually. If you have a link, I recommend censoring the text before the link, the link text, (optionally) the link URL, and the text after the link separately.

You should also considering switching to a less-sensitive filter:

  • https://crates.io/crates/censor
  • https://crates.io/crates/stfu
  • https://crates.io/crates/profane-rs

finnbear avatar Mar 17 '25 05:03 finnbear

i did "1111111111" and it triggered a false positive lol this is definitely an issue

queenkoopa8500 avatar Mar 26 '25 02:03 queenkoopa8500