nyaa icon indicating copy to clipboard operation
nyaa copied to clipboard

SPAM Filter System

Open ghost opened this issue 8 years ago • 13 comments

There needs to be a SPAM filter before enabling upload, to prevent people and bots from uploading garbage into the site.

Here are some ideas for that, along with the moderation system mentioned on #35

The main problem is that this being an OSS project, you'd need to either:

  • Make them public too, making them easy to bypass, even if you keep adding stuff to them.
  • Make the System itself, but leaving the actual list blank (private), but this would make thing difficult for other people that want to deploy their version of the site, as they would need to create their own set of filters to prevent SPAM

IMO the second solution is best, while also lending your own version of the filters to other (trusted) people for them to host their version of nyaa.

ghost avatar May 06 '17 18:05 ghost

I think that before uploading, you should have an account.

aerojun avatar May 06 '17 18:05 aerojun

  1. just as @aerojun wrote, you should have an account before uploading.
  2. names that contain phrases like "[HorribleSubs]" should be reserved for only certain users(in this case, official HS account)
  3. unofficial batches probably should be prohibited
  4. someone on IRC suggested that all torrents expect for those uploaded by trusted users should be checked by a mod, submitted as "OK" by him and then shown on the website.. that would mean a virtually infinite amount of work for mods, but i think we use this idea somehow
  5. maybe the system should check for the anime/manga on mal or other anime database to make sure it really exists? this would avoid typos as well as fake stuff
  6. i think we should ban software and games section, at least for a while - it wasn't that strong anyways and it's kinda dangerous

sdomi avatar May 06 '17 18:05 sdomi

Suggestions 3-6 on redsPL's list are completely senseless. Requiring an account for uploading + reporting system + mods deal with reports with deletions/bans is enough.

You have all the ingredients in your hands to completely mess up this revival with absurd rules and/or stringent conventions. Please think your solutions through.

Kuraperunat avatar May 06 '17 19:05 Kuraperunat

Agree on having an account, though maybe anon uploads should be considered.

names that contain phrases like "[HorribleSubs]" should be reserved unofficial batches probably should be prohibited i think we should ban software and games section

wtf no

all torrents expect for those uploaded by trusted users should be checked by a mod

way too much work, doesn't scale

maybe the system should check for the anime/manga on mal

this is a public tracker we don't need (or want!) quality control except for marking torrents as a+/having trusted users

sfan5 avatar May 06 '17 20:05 sfan5

I royally disagree with needing an account just to upload.

This is a public torrent indexer, it lives and thrives on anonymous uploads, just like old Nyaa and Sukebei did, the latter having more anonymous uploads/account uploads ratio than Nyaa. I mean, this is a combined effort by the anons of both /a/ and /g/ after all.

With the proper SPAM filters plus the Report System i already mentioned, it shouldn't be too much work for the Moderators. In fact, old Nyaa had only 17 moderators, even less when they just started (around 8, 9).

IMO the priority should go:

  1. SPAM filters system
  2. Report and Moderation system
  3. Enable upload
  4. Account and Registration system

Of course, this is after the other major issues the site needs like db conversions and torrent data scrapping, among others.

ghost avatar May 06 '17 21:05 ghost

There are two kinds of spam:

  1. something meant to clog up the site. it's targeted at the site itself. to make it less useful or to displace genuine content.
  2. direct malware or indirect things baiting users to go to a certain websites

There should be some decent heuristics to match 2).

  • executable files (outside a games category)
  • .url files telling people to go to certain pages
  • zips or other archives in the anime category (archives are used for images or music)
  • other uncommon filetypes like .doc, .js, ... that can be directly used to exploit users.

In other words the most common kinds of content (just media files) should be relatively safe. They could still contain a video or in-torrent comments telling users to do things, but they're not immediate exploits at least.

an-electric-sheep avatar May 06 '17 22:05 an-electric-sheep

uploaded torrents could have a short delay before they become publicly visible. that way a spammer can't brute-force possible rules by trial-and-error. And any suspicious content can be brought to the attention of the mods by the ruleset.

an-electric-sheep avatar May 06 '17 22:05 an-electric-sheep

Having a spam filter doesn't make any sense either. If someone wanted to spam the site with a bot, whatever magic you put into a filter is trivial to bypass.

Do not make an automated filter before spam is a real, observed issue. On Nyaa it never was or was handled appropriately by mods. Even if spam should become a thing, the filter should be heavily targeted towards that certain type of spam instead of coming up with vague rulings that are more likely to ultimately end up annoying normal users.

As for what an-electric-sheep is proposing:

  • Executables are common for many genres. CG sets can be wrapped in an exe viewer, Audio experiences are often wrapped in kirikiri VN engine and so on. Heuristics would never work for these.
  • Url files are very commonly included for raws uploaded from other ddl sites, having to strip these would just annoy people
  • Zips are a common way to pack multiple files and can totally be a thing for anime (e.g. batches, subtitle files, bd art). Restricting this would make no sense.
  • .doc and .js files are very common for games (e.g. rpgmaker mv games are written in js), their documentation and their myriads of other totally safe and legit use cases. Again, heuristics would never work.

Please think about the solutions in the context of the whole site, not just the anime section.

Kuraperunat avatar May 06 '17 22:05 Kuraperunat

Do not make an automated filter before spam is a real, observed issue. On Nyaa it never was or was handled appropriately by mods.

How do you know? I think that nyaa had a very, very good spamfilter, and the rest was handled by mods.

sdomi avatar May 06 '17 23:05 sdomi

@Kuraperunat

We are talking about a Nyaa replacement, a site that had an Alexa Global rank of 1400~ (Japanese rank was 240!), and was the go-to site for downloading Anime and Manga, while Sukebei was for HGames and JAV. It will have spam, so it needs a good spam filter.

I agree with the rest of your post, any kind of silly heuristics like @an-electric-sheep is saying would be just overkill and unnecessary, and would piss people off more than help.

ghost avatar May 06 '17 23:05 ghost

Should we ban/forbid use of link shorteners (especially those with unskippable ads) in comments/torrent descriptions? They could be used to spread malicious content, or in case of ad ridden ones, grab a quick buck. I am not sure whether it should be allowed at all since this is going to be a public torrent tracker, not a platform for getting rich real quick. If anybody decides this is a good idea here's a convenient list of the ad ridden ones.

Argolics avatar May 08 '17 01:05 Argolics

The majority amount of spam would be from torrent descriptions or comments, the first line of defense is the captcha on user creation and torrent uploading.

For torrent descriptions we should make some rules to prevent a user's experience from being negativity impacted and for comments it will be the whack-a-mole moderation game that comes standard with a website.

here's a convenient list of the ad ridden ones. @Argolics Seems like a good preventive measure that can be quickly implemented to save the users, which we could check on torrent upload/edit and comment posting.

yiiTT avatar May 08 '17 03:05 yiiTT

@Argolics Sukebei descriptions used a lot of links to crappy ad-ridden image hosts for screenshots

loadletter avatar May 11 '17 11:05 loadletter