PeerTube icon indicating copy to clipboard operation
PeerTube copied to clipboard

Shared spam filters that admins can contribute to and subscribe to

Open scanlime opened this issue 2 years ago • 6 comments

Describe the problem to be solved

It's common I'll see someone spamming my server in a way that's quite predictable, and it would be handy to share IP addreses to block with other administrators. I'm not sure how automated or centralized this procedure should be, (probably not much) but even having a way for administrators to share moderation data like this in real-time, leaving decision-making to humans still, would be helpful i think.

Describe the solution you would like:

Similar to the abuse system, this could be a way to send moderation data through the existing follow relationships between servers. I imagine that with the default settings there would be no change from current behavior, but admins could opt into sharing filter settings that they create, and they could subscribe to manually approve filters that they like from other admins.

Describe alternatives you have considered

The moderation data could be totally out-of-band, in a different medium like wikis or emails. This would require maintaining multiple parallel social networks though, rather than operating within the fediverse.

The other alternative would just be to not share any information, which is basically what we have now. This makes it very easy for the spammers to create a simple script that can easily reach all open PeerTube servers.

scanlime avatar Oct 12 '21 21:10 scanlime

I have the same problem on my instance. Spammed registrations.. Mostly the account name and channel created have the same name and look something like ruhfftchbhuv. Seems like randomly generated values.

Some registrations are verified, others not. The predominant amount is verified. One thing to mention: -> for my instance about 80% (where 100% is 10 registrations) are registered with a domain not having a web server (and yeah, neither a web server nor an A record are needed to use a domain for mailing)

To make sure the mail server exists at all dns.resolveMX and dns.reverse could be used, but when the email verification is enabled this may be useless because without an MX record there is no mail server to send an email to. For the mail server's spam score this may be still useful, if it's true that sending mails to an non existing mail server affects such spam scores (the mail server will also try multiple times to deliver the mail).

A first little stone for scripting registrations could be to detect if someone is inserting the password by script (document.getElementById('password-input').value = "kljasdlkjaslkdjalsjkd") or by actually typing the password in the input field with the keyboard. Explored this a few seconds ago (chrome and firefox). When the value is inserted with el.value = "" no change, input, keydown or keyup event is fired. When the value is pasted or typed in the input filed the events are fired.

Also tried to input the password with selenium (python version). No events fired, too.

To use events to detect if someone is actually typing in the password or a script is doing that, seems for me an easier solution than starting implementing a big new feature like shared spam filters (probably a huge amount of work).

This should slow down the most spammers, because which hacker/spammer/scammer is typing in the password by hand? Surely they have "better" things to do. Someone must really hate you to waste time doing that.

Edit: One thing which came to my mind is following. When detecting if the password is typed or scripted, and the hacker detects the detection, he could try to use

const e = new Event('input', {
    bubbles: true,
    cancelable: true,
});

document.getElementById('password-input').dispatchEvent(e)

to bypass the detection. Because then an input, or also the other events, are fired. By removing the dispatch function from this specific element (if not used by the peertube frontend) with

document.getElementById('password-input').dispatchEvent = null

the problem is partially solved (the events are still fired when typing.). Unfortunately when a dispatch function of another element is assigned to this empty dispatch function, the functions is working again. But javascript has a solution for this problem. After removing the function a simple

Object.freeze(document.getElementById('password-input'))

will make the element immutable. Now no events should be fireable by script, but handlers can still be attached an will be executed. (tried this while writing)

Mastercuber avatar Oct 14 '21 22:10 Mastercuber

I doubt client side measures short of a CAPTCHA (which we have plugins for) will help against any of the spammers I'm seeing. Broadly there seem to be spammers with relatively little automation who create accounts with SEO profiles, and there are obvious scripts that almost certainly aren't even running in a web browser.

The verification itself is being used for nefarious purposes too, with some of these scripts seeming to be intent on probing email addresses for some kind of automated response. Those are cases where I can look, as a human, and quickly blacklist an IP address (or as you suggest a regex pattern) and get the spam to stop for a while. If I can share this kind of human decision with other humans, we can have fewer spammers with trivial automation.

ghost avatar Oct 15 '21 04:10 ghost

I think my proposal is a step in the right direction, though not a very big step since, like you said, some srcipt aren't running in the browser.

Removing the dispatcher function of the password element and freezing the element, I think, should be not that hard to establish.

I can imagine there are some f.e. python scripts running outside the browser to emit keyboard events like with the python library keyboard you can emit keyboard events, which should act as a real keyboard press. But other libraries for sure also out there.

That it's possible to activate a CAPTCHA plugin I didn't recognized. A CAPTCHA plugin is activated for my instance now.

Exactly when they have little automation I think it's a good idea to implement some client side measures to keep the little automated away. And yes that's just one step in the right direction, but I think not the most expensive one for the beginning (also a CAPTCHA plugin is a good step).

It's may be a bit out of scope but here is a little script I use to recognize unknown mail connection tries, also adding the count of tries per IP with:

cat /var/log/mail.log | grep -Eo "unknown[[[:digit:]]+\.[[:digit:]]+\.[[:digit:]]+\.[[:digit:]]+" | sort | cut -c 9- | uniq -c

With: iptables -I INPUT -s 1.1.1.1 -j DROP I add them manually to the kernel filterlist.
How to know what IP address should be blacklisted?

What you mean with "accounts with SEO pofiles" I don't know. Ok I looked at the description of some accounts and also recognized that a <meta description=".."> element is created for this descriptions.. Now I know what you mean with the cited sentence... -.- EDIT -> Finding the IP's to the SEO account domains through the mail log is probably one solution. They have "often" no web server and, though different domains, the same IP.

If I thought about it... To create a list f.e. named spam filters should be no big deal, because other list, like the users list, already exists. Whats needed after having a functional list is a page displaying one spam filter. A spam filter may consists of a list of IP addresses and a "share this list" and also "activate this list" button. Somewhere it should, like you suggested, also be possible to search for filters of other admins. Then it would be possible to add IP's to a filter list, activate this list, share this list, discover lists from f.e. subscribed instances and subscribe to some lists of others.

So three more pages added, with the functional background, should be enough for adding a spam filters feature.

How the functional background may look like, I don't know.. Maybe an activity streams collection could be used as a spam filter list. Maybe an extra vocabulary could be created to express the list and the filter or to express the filter only. Maybe the list could be published, and it can be subscribed to, to other instances in another way.

An IP address could be expressed in a vocab an then the BLOCK activity could be used to send a block of a new IP (adding a new IP to a spam list) to the subscribers of this list. And UNDO could be used to undo the block of an IP (IP is remove from a spam list).

An ADD could be used to add a filter list (from the discovery list) to the admins filter list and REMOVE to remove the list. Filter lists could maybe added to the streams list of the admin (peertube?) actor object. A filters list, a filter itself and an IP address would be needed to be expressed with a vocab, i think.. Maybe also an active and inactive state of a filter list, to be able to disable (and re-enable) a list on the subscriber side.

When an admin subscribes to a list or an IP is added to an activated list, then before the registration, the domain, or/and mail server domain, could be resolved to the IP address and be checked against the IP's of existing (active) and subscribed filter (may also with state active and inactive) [nodejs dns package]. If a match occurs, the registration process could then be aborted.

Mastercuber avatar Oct 17 '21 21:10 Mastercuber

Ok, client side measures maybe not the best way handling the spammers problem. Neither a script running in the browser nor something like selenium is needed to send a POST request to /users/register with the needed payload. That could also be done with a bash script using curl or any other programming or scripting language which can send HTTP requests..

Creating a hash every time a user is viewing the register page and sending this hash with the payload when actually registering seems like a little break one could automate away if recognized (first fetching the page, extracting the hash and then send the request).

Mastercuber avatar Oct 23 '21 18:10 Mastercuber

I doubt client side measures short of a CAPTCHA (which we have plugins for) will help against any of the spammers I'm seeing. Broadly there seem to be spammers with relatively little automation who create accounts with SEO profiles, and there are obvious scripts that almost certainly aren't even running in a web browser.

The verification itself is being used for nefarious purposes too, with some of these scripts seeming to be intent on probing email addresses for some kind of automated response. Those are cases where I can look, as a human, and quickly blacklist an IP address (or as you suggest a regex pattern) and get the spam to stop for a while. If I can share this kind of human decision with other humans, we can have fewer spammers with trivial automation.

Could we possibly use some sort of Bayesian filter? I moderate a rather active instance, so I usually run into like 10 spam accounts a week and they generally tend to have patterns in how they're made. Maybe check username, tag, channel name, bio, email, etc? The overuse of certain phrases, certain common names which imply advertising (CPR certification, legal help, customer service, casino, etc. - all stuff I've run into very often) might be able to trigger flags.

Spitballing here, but perhaps combining that with an interface that:

  • autoblocks suspected spam accounts, prohibiting them from posting, having their profile publicly visible, or contributing to member count
  • a shared database of known accounts and the ability to contribute accounts that were caught in your filter to the log
  • a menu for manual review, configuration such as replacing the CSV containing the filter training data, setting detection threshold, and whitelisting/deletion

To be quite honest, this sounds rather implementable as an extension/plug-in. I might've done it myself, but I'm not too familiar with the codebase.

For what it's worth, I've begun collecting that sort of data from the accounts before I delete them. Maybe someone will find this data useful in the future, maybe not, but couldn't hurt to collect.

https://cryptpad.fr/sheet/#/2/sheet/view/FFhKJo2p86Gk-X16EXVD442oI3pi-DPPvSA-wzoQKiM/

TomatDividedBy0 avatar Dec 19 '21 04:12 TomatDividedBy0

It's worth pointing out that more than half of the work has been done thanks to the countless projects fighting generic web and email abuse. There's a bunch of lessons to learn from those really. Primary one being that there's no silver bullet, no one method will ever be without its downsides, supporting a wide breath of options is the most reliable way. (Take a look at rspamd to see a selection of methods)

For example, PeerTube could support DNS BL's as described by RFC5782. Another common solution is periodically updated lists (newline separated) of bad IP's delivered over HTTP. Some blacklists also have an HTTP API, but it's not as common.

Avamander avatar Apr 26 '22 22:04 Avamander

Hello,

I'm closing this issue because the plugin system can handle such use case using an external tool. For example we developed an akismet plugin that could be easily adapted to use any other tool like a shared spam list.

Chocobozzz avatar Dec 14 '22 08:12 Chocobozzz