SmokeDetector icon indicating copy to clipboard operation
SmokeDetector copied to clipboard

New report reason: Username similar to link text

Open fastnlight0 opened this issue 1 year ago • 1 comments

Is your feature request related to a problem? Please describe.

It would help catch more spam

Describe the solution you'd like

SD should report posts where the text of a link is similar to a user's name, as that is almost always a sign of spam. Post example (Username: Cool Spam Service, imagine the link went to a spam site instead of SO):

Your problem can be solved easily Cool Spam Service with recursion...

We would want to catch these and give them a higher score. This is different from URL similar to username, as some companies' website doesn't always include their username (MS example)

Describe alternatives you've considered

N/A

Additional context

N/A

fastnlight0 avatar Jan 29 '24 21:01 fastnlight0

Looking at findspam.py, there's this function that contains some code that could be used to extract link text from a post. Specifically, this line finds link text within the string s.

links = regex.compile(r'nofollow(?: noreferrer)?">([^<]*)(?=</a>)', regex.UNICODE).findall(s)

I.e.,

>>> import regex
>>> s = '<a href="google.com" rel="nofollow">Test</a>'
>>> links = regex.compile(r'nofollow(?: noreferrer)?">([^<]*)(?=</a>)', regex.UNICODE).findall(s)
>>> links
['Test']

There's another function here that seems to check similarity, and that's used in the "username similar to website" rule. Lastly, there's also a wiki page on adding spam checks/rules.

If this is something that's a good idea to add, I can take an attempt at making a PR implementing this change.

CoconutMacaroon avatar Jan 29 '24 23:01 CoconutMacaroon