Add automatic checking for profanity
This adds functionality to automatically check for profanity in text messages written in any of the XMPP MUC rooms monitored by the moderation bot.
The terms being considered profanity can be configured using the database and are language specific. They have to be stored in their lemmatized form. English terms will always be checked, in addition, if a supported language other than English is detected, the terms configured for that language are being checked as well. Supported languages for now are English, French, German, Portuguese, Russian, Spanish and Turkish.
The first two times in a sliding window of three months a user uses profanity they'll receive a warning. Starting from the third time, the user will get muted. At first users will be muted for five minutes, with an exponentially increasing duration up to one week for each continued use of profanity afterwards.
To enable this functionality the --enable-profanity-monitoring command line option has to be provided.
This change requires a database migration for existing databases. The following SQL-commands can be used for that:
DROP TABLE profanity_whitelist;
CREATE TABLE profanity_terms (
term VARCHAR(255) NOT NULL,
language VARCHAR(2) NOT NULL,
PRIMARY KEY (term, language)
);
INSERT INTO
profanity_terms (term, language)
SELECT
word AS term,
'["en"]'
FROM
profanity_blacklist;
DROP TABLE profanity_blacklist;
ALTER TABLE profanity_incidents
RENAME TO profanity_incidents_old;
CREATE TABLE profanity_incidents (
id INTEGER NOT NULL,
timestamp DATETIME NOT NULL,
player VARCHAR(255) NOT NULL,
room VARCHAR(255) NOT NULL,
offending_content TEXT NOT NULL,
detected_languages JSON NOT NULL,
matched_terms JSON NOT NULL,
PRIMARY KEY (id)
);
INSERT INTO
profanity_incidents
SELECT
id,
timestamp,
player,
'[email protected]',
offending_content,
'[]',
'[]'
FROM
profanity_incidents_old
WHERE
deleted != '1';
DROP TABLE profanity_incidents_old;
Two false positives I found in testing:
frJ'étais en retard avec ma cavalerie
esEso puede retardar los romanos
If you name your player an insult you can get the moderation bot to kick the ratings bot. This was fun to test 😆
If you name your player an insult you can get the moderation bot to kick the ratings bot. This was fun to test 😆
Perhaps we could simply exclude filtering for the specific JID associated with the other bot ?@Dunedan
Thanks for reporting these issues.
Two false positives I found in testing:
fr J'étais en retard avec ma cavalerie
es Eso puede retardar los romanos
While this looks like the reason for these two false-positives might have been the same one, it's actually been two different reasons.
For the French sentence it was because the bot always checked the English profanity terms as well, in addition to the ones in the detected language. I changed that now, so it doesn't check English ones anymore if it detects at least one other language with 100% certainty. That won't fix all of such false-positives, but should produce much fewer of them.
The Spanish sentence was caused by a bug in the detection of profanity in phrases, which caused partial words to get matched.
If you name your player an insult you can get the moderation bot to kick the ratings bot.
I already had thought about this case when implementing the functionality and the intention was to not punish users for writing other users names, even if these names contain profanity. However, there was a bug in the implementation so it only checked the usernames against the lemmatized words written. That meant the moderation bot would detect EcheLOn writing "fuck" and not finding a player with the same name.
All of these issues should be fixed now, but I'd appreciate further testing.
Glad to help. Me and Norse_Herold had been talking around with profanity monitoring so I had a few test cases in mind.
