Umbraco.Forms.Issues
Umbraco.Forms.Issues copied to clipboard
Blacklist email domains to prevent spam
I have previous suggested a configuration to allow blacklisting email domains to prevent spam. https://github.com/umbraco/Umbraco.Forms.Issues/issues/169
Even with reCAPTCHA v3 it seems it is possible to bots to bypass it and sometimes a lot of spam entries are created.
E.g. in a quite new Umbraco project with reCAPTCHA v3:
E.g. it would help a lot to blacklist @raiz-pr.com
..
I have also seen something like @motorza.ru
...
Maybe Forms could even add a dashboard and detect what may look like spam, e.g. Form entries from email more that e.g. > 50 entries? with option to add it to blacklist.
On the other hand there could also be a whitelist for common email domains like @gmail.com
, @outlook.com
etc.
I know spam also comes via these (fictive) emails, but I think it at least could help a bit - especially if not using reCAPTCHA or Honeypot.
@AndyButland is this something considering to implement in Forms? E.g. with ReCaptcha v3 we still see many spam entries through forms.
E.g. some smaller Danish companies know that the never expect to receive mails from .ru
email domains, which is some of the typical one to spamming with form entries.
I think will probably need to be custom code @bjarnef - at least for now. For one, Forms doesn't know what fields are ones to check (i.e. your field alias is likely "email", but we can't know that for sure). There's a FormValidateNotification
notification you could hook into, or if you just wanted to silently not record it, there's also a RecordCreatingNotification
you could cancel. I think that should give you the hooks you need.
@AndyButland thanks.. any specific going on between FormValidateNotification
and RecordCreatingNotification
I should be aware of? I guess workflows are executed after both as it executes based on data from record.
@bjarnef you'd definitely need to know not only your form structure, but also your audience. I also get a fair bit of leakage from reCAPTCHA3 but most of it comes from fake
@c9mb yes, we can of course not blacklist @gmail.com
, @outlook.com
, @hotmail.com
, etc.. and a whitelist would need to contain a lot of potentially business email domains.
However sometimes there are some patterns and same bots, crawlers (and perhaps humans as well) spamming a form, e.g. @motorza.ru
or .ru
, which a local small business in Denmark for instance never would expect to get mails from.
So it depends on the customers, but often there are some patterns. Furthermore e.g. there are often spam from (fictive) @gmail.com
accounts as well - sometimes random addresses, other times same address submit several entries, which humans typical don't - at least not the same day or within a few hours.
It the form contain a message field, the often contains several links as well when submitted from bots.
@AndyButland I wonder if there is something more to do about this?
On an Umbraco Cloud project using Umbraco 12.3.9 and and Forms 12.2.4 they have about ~7K entries for a single form 😳🙈
Most of them from @registry.godaddy
The Form doesn't use reCAPTCHA but the HoneyPot package
We could hook into the form events as you previously mentioned https://github.com/umbraco/Umbraco.Forms.Issues/issues/1142#issuecomment-1987891890
Is the a simple way to cleanup in database?
DELETE * FROM [form records]
WHERE formId = [guid] AND email LIKE '%@registry.godaddy%'
email
may depends on the field alias in the forms.
In SQL you'll need something like this to remove all the submissions that you've identified as spam. There are a few related tables to consider.
** Please make sure to test on a backup first as I've just written it now **
DECLARE @formId uniqueidentifier
DECLARE @fieldAlias nvarchar(255)
DECLARE @value nvarchar(255)
SET @formId = '<your form guid>'
SET @fieldAlias = '<your email field's alias>'
SET @value = '@registry.godaddy'
-- Get IDs of records to remove
SELECT r.Id
INTO #recordIds
FROM UFRecords r
INNER JOIN UFRecordFields rf ON rf.Record = r.Id
INNER JOIN UFRecordDataString rdf ON rdf.[Key] = rf.[Key]
WHERE r.Form = @formId
AND rf.Alias = @fieldAlias
AND rdf.Value LIKE '%' + @value + '%'
--Delete record from all tables
DELETE FROM UFRecordDataBit WHERE [Key] IN (
SELECT [Key] FROM UFRecordFields WHERE Record IN (SELECT Id FROM #recordIds)
)
DELETE FROM UFRecordDataDateTime WHERE [Key] IN (
SELECT [Key] FROM UFRecordFields WHERE Record IN (SELECT Id FROM #recordIds)
)
DELETE FROM UFRecordDataInteger WHERE [Key] IN (
SELECT [Key] FROM UFRecordFields WHERE Record IN (SELECT Id FROM #recordIds)
)
DELETE FROM UFRecordDataLongString WHERE [Key] IN (
SELECT [Key] FROM UFRecordFields WHERE Record IN (SELECT Id FROM #recordIds)
)
DELETE FROM UFRecordDataString WHERE [Key] IN (
SELECT [Key] FROM UFRecordFields WHERE Record IN (SELECT Id FROM #recordIds)
)
DELETE FROM UFRecordFields WHERE Record IN (SELECT Id FROM #recordIds)
DELETE FROM UFRecords WHERE ID IN (SELECT Id FROM #recordIds)
-- Clean up
DROP TABLE #recordIds
@AndyButland when looking into this from FormValidateNotification
we can access UserAgent string notification.Context.Request.Headers["User-Agent"]
.
With RecordCreatingNotification
each item in SavedEntities has a IP
property. Does it know about the UserAgent at this state? It could of course add a hidden field to form and magic string, but if there an other way to pass in other information from the original request?
Often we could detect if the request is from a bot/crawler and minimize the amount of spam: https://stackoverflow.com/questions/544450/detecting-honest-web-crawlers