Long numbers trigger false positives
[:floppy-disk: Notes.pdf](https://xxx.xxx.cdn.digitaloceanspaces.com/shared/attachment/adbb9c05-e965-43ac-8dc5-396455e82ead.pdf)
[:memo: 经济学Notes.md](https://xxx.xxx.cdn.digitaloceanspaces.com/shared/attachment/9bc15510-2b02-493b-895a-148647523138.md)
[:film-projector: a8dab022-ebcd-49a1-97c0-0b172422e568.mp4](https://xxx.xxx.cdn.digitaloceanspaces.com/shared/attachment/1c1c55b1-37e0-4603-aea7-47d622b0401f.mp4)
When I tried to validate this URL, it failed validation, mainly due to the number: 148647523138.
I would like to know if there is a way to avoid this, or if you can handle the URL more leniently.
Thanks for the issue! The problem is that the censoring part of rustrict has a tendency to see profanity within numbers. Each digit may be replaced with one of several possible similar-looking letters, which often leads to at least one profanity detection. This affects all numbers, including those within URL's (which rustrict doesn't detect).
I've been aware of this issue for months and haven't found a fix yet.
I recommend only passing plain-text to rustrict, not Markdown. You could parse the Markdown to an AST and censor each text node individually. If you have a link, I recommend censoring the text before the link, the link text, (optionally) the link URL, and the text after the link separately.
You should also considering switching to a less-sensitive filter:
- https://crates.io/crates/censor
- https://crates.io/crates/stfu
- https://crates.io/crates/profane-rs
i did "1111111111" and it triggered a false positive lol this is definitely an issue