SmokeDetector
SmokeDetector copied to clipboard
New report reason: Username similar to link text
Is your feature request related to a problem? Please describe.
It would help catch more spam
Describe the solution you'd like
SD should report posts where the text of a link is similar to a user's name, as that is almost always a sign of spam. Post example (Username: Cool Spam Service, imagine the link went to a spam site instead of SO):
Your problem can be solved easily Cool Spam Service with recursion...
We would want to catch these and give them a higher score. This is different from URL similar to username, as some companies' website doesn't always include their username (MS example)
Describe alternatives you've considered
N/A
Additional context
N/A
Looking at findspam.py
, there's this function that contains some code that could be used to extract link text from a post. Specifically, this line finds link text within the string s
.
links = regex.compile(r'nofollow(?: noreferrer)?">([^<]*)(?=</a>)', regex.UNICODE).findall(s)
I.e.,
>>> import regex
>>> s = '<a href="google.com" rel="nofollow">Test</a>'
>>> links = regex.compile(r'nofollow(?: noreferrer)?">([^<]*)(?=</a>)', regex.UNICODE).findall(s)
>>> links
['Test']
There's another function here that seems to check similarity, and that's used in the "username similar to website" rule. Lastly, there's also a wiki page on adding spam checks/rules.
If this is something that's a good idea to add, I can take an attempt at making a PR implementing this change.