h icon indicating copy to clipboard operation
h copied to clipboard

Investigate spam bot accounts & add deterrent

Open darylhedley opened this issue 3 years ago • 2 comments

The last few weeks we have seen a large spike in users registering and posting information to their user profile pages. They are open pages and seem to be used to deep link for SEO traffic via the Hypothes.is website.

Investigations are already under way and CloudFlare auto "captcha like" deterrent has been enabled. However, we are going to wait 3-6 days to see if this has had any effect then make a decision on the following 3 steps:

  1. Put measures in place that are able to prevent the majority of spam accounts being created
  2. Find a way to delete existing spam accounts
  3. Put monitoring in place to help us detect future flare-ups

Tasks

  • [ ] @robertknight - To put in Cloudflare anti-bot protection on the email confirmation page
  • [ ] @indigobravo - To add notifications in Slack should we get an unexpected surge of sign ups (could be interesting good or bad)
  • [ ] To create a list of candidate email domains to remove
  • [ ] Vet the list does not contain any annotations
  • [ ] Backup existing records
  • [ ] Issue a delete of the effected domains

darylhedley avatar Sep 06 '22 16:09 darylhedley

It was suggested:

  1. We could place a Captcha on the register page and also the email confirmation page
  2. We think around 90% of the accounts could be accounted for and deleted. However, we should export this data first then delete so we can keep a record for future cases
  3. Suggestion around keeping the top list of email accounts signing up and monitor high amounts of sudden signups above the norm.

darylhedley avatar Sep 06 '22 16:09 darylhedley

Put measures in place that are able to prevent the majority of spam accounts being created

We might want maintain a list of banned email providers and straight up prevent them at source as well?

We think around 90% of the accounts could be accounted for and deleted. However, we should export this data first then delete so we can keep a record for future cases

If we want to keep this data, then the database is probably the easiest place for us using our current mechanisms. We could:

  • Create a new tables "archived_accounts", "archived_groups"
  • Use a DB migration to move things into these tables
  • This would be a controlled process using our existing pipelines
  • It's also reversible if we change our minds (with the caveat that people could grab the names etc)

jon-betts avatar Sep 06 '22 16:09 jon-betts