referrer-spam-list Create a website to report spammers

Using GitHub issues and pull requests to report new spammers is starting to show its limits:

we want confirmation when adding new domains
bulk pull requests can't be validated at once
hard to find intersections between bulk pull requests to validate single domains (e.g. #96)
pull request conflicts
trouble with keeping the list sorted (some PRs ignore the sort)
poor traceability (sometimes the person committing to the repo is not the one who opened the issue or pull request for many reasons)
not so easy for users that are not very familiar with GitHub

How about we create a website dedicated to fighting referrer spam?

the list would be exposed and downloadable on the website
users could report new spammers very easily (a simple form)
users could vote on domain already reported to confirm it as a spammer
each domain would have its own page listing everybody who reported it/confirmed it as spammer (good traceability, hopefully would also pop up on Google when people search for it)
domain with enough votes could be merged back into the list (still hosted on GitHub): it could be done manually at first, and automatically later

Users could log in using GitHub (at first) so that we are sure one person can vote once, and to avoid vote manipulation.

This would also be a good way to promote Piwik and it spammer blacklist initiative.

ping @mattab and ping @quba which whom we discussed the idea.

Aug 10 '15 08:08 mnapoli

How would this website operate? How would you prevent the spammers from gaming the system to downvote their domains? I like the concept/idea, but it seems very ambitious for right now.

Aug 10 '15 16:08 calebpaine

In a first version users could log in using their GitHub account. That way there will be no more problems than what we can have today.

What do you find ambitious?

Aug 10 '15 16:08 mnapoli

I think this is a good initiative, however I see two main challenges:

For it to be efficient, you need to maximize the number of voters. Referer spam is not a problem specific to Piwik. Are you willing to promote the site to a a larger audience (not only Piwik users), provide tools like Google Analytics filters, etc.? In short will you make the site "let's fight referer spam", and not just "let's improve Piwik's blocklist"?
If the site becomes popular enough, as @calebpaine said, there is a risk spammers will use it to downvotes their spammed domains, or even worse to flag as spam domains of competitors, etc. How will you prevent that?

Random possible ideas to make the system more reliable, and "confirm" a domain as spam:

If domain A gets spammed with domain B as referer, automatically check if there is a link from B to A. If not, we know the request has been forged and does not come from a legitimate HTTP client. Not easy to do however with highly dynamic sites, pages specific for logged in users, etc.
Set up a honeypot: a domain with no content, not indexed on search engines. I bet their spam bots just scan IP ranges and send requests when TCP port 80 is open. All domains sent as referer to this honeypot can be confirmed as spam.

Aug 11 '15 20:08 desbma

For it to be efficient, you need to maximize the number of voters.

One solution we discussed was to create a feature in Piwik to let users report spammers (quick solution: link to the website, better solution: report a referrer in one click).

Referer spam is not a problem specific to Piwik. Are you willing to promote the site to a a larger audience (not only Piwik users), provide tools like Google Analytics filters, etc.? In short will you make the site "let's fight referer spam", and not just "let's improve Piwik's blocklist"?

Promoting the website would happen for sure. For tools, I'm sure this is out of scope for a first version. On the long term I don't know.

If the site becomes popular enough, as @calebpaine said, there is a risk spammers will use it to downvotes their spammed domains, or even worse to flag as spam domains of competitors, etc. How will you prevent that?

This has been answered already.

Aug 12 '15 07:08 mnapoli

Rephrasing my thoughts: how will you prevent a spammer from creating 2 GitHub accounts, and downvote a legitimate domain (or upvote a spammy domain)?

Aug 12 '15 08:08 desbma

The same issue exists today, yet it isn't a problem. If the quality of votes is an issue, we'll find a solution. There's no point in freezing any progress just because challenges might appear in the future.

Aug 12 '15 08:08 mnapoli

The same issue exists today, yet it isn't a problem.

The only difference is the number of users. I doubt the spammers know about this list yet, but if it becomes very popular they probably will.

If the quality of votes is an issue, we'll find a solution. There's no point in freezing any progress just because challenges might appear in the future.

Nobody said you should freeze anything, but there is no harm in thinking before building.

A way to make abusing the system more difficult is to require a number of votes proportional to the total number of voters, for example if you have 100 users, require 5 votes, 1000 users, 50 votes, etc.

Aug 12 '15 09:08 desbma

I can only comment as a user who is committed to reporting the spammers. I do feel a little intimidated with github. For example, I saw a new issue with "awaiting confirmation" label. I can't figure out how to add that to the issue I just posted with a new spammer.

So I would welcome a more simple to use website. However, I agree with an earlier comment -- how do you prevent spammers from actually joining?

I don't think you could prevent spammers from joining. So the site would have to contain the ability for other users to report users who appear to be always voting against approving spammers.

Just a couple of thoughts from a not-so-tech-savvy user :-)

Aug 18 '15 03:08 brynnd

Hey all,

I think I'd like a similar solution to the DNSBL lists out there or Drupals https://www.mollom.com.

No upvotes, just down votes for banning
manual removal process for the banned referer as on Spamhaus.
Perhaps a simple API for down voting?
spam traps could be a great idea and work well for signup and email spam

As for a rule base:

weighted threshold of votes for a block relative to overall volume of reports perhaps?
first block could be for X time period and compare reports during block and after block expires
manual removal process only auto excepted say 3 times
block votes/reports to come from different class C IP's etc and a sensible time range to qualify

I'm not sure having people signup to github or anywhere else really helps or at least is worth the barrier for people contributing.

I'm happy to contribute dev time into this.

Feb 04 '16 09:02 paulhudson

+1 for a solution to let users report spammers directly within Piwik

Jun 07 '16 07:06 gdementen

referrer-spam-list referrer-spam-list copied to clipboard

Create a website to report spammers

referrer-spam-list
referrer-spam-list copied to clipboard