lemmy icon indicating copy to clipboard operation
lemmy copied to clipboard

Alternative approval method for new users

Open Nutomic opened this issue 2 years ago • 16 comments

Lemmy software is efficient enough that it could easily scale to 10x the current users. But this is prevented because a major bottleneck exists in the onboarding of new users, which is too complex so that many potential users will be discouraged.

Currently the steps for a new user are something like this:

  • Potential user is linked to an interesting Lemmy post from another site and wants to comment
  • To signup, the user has to answer multiple questions
    • some users will already be discouraged by this, because they dont know what to respond
  • Needs to wait for admin approval, which can take hours or days
    • If email sending is broken on the server, the user will never know that the registration was accepted
    • If approval takes too long, the user might forget why he signed up in the first place, and lose interest

Heres a different way how new users could be onboarded, which is a much smoother process:

  • Potential user is linked to an interesting Lemmy post from another site and wants to comment
  • Registers account by entering username and password (no need to answer a question)
  • User can immediately comment, but these items are hidden from others and dont federate
  • Admins review these comments and approve them (or ban the user)
  • After 5 posts/comments approved (configurable), the account is automatically unlocked and can post without review, as well as perform all other user actions

The advantage of this approach is that the new user doesnt have to worry about the approval process, but can directly start interacting. At the same time, admins can review new accounts and effectively stop bots. New users would only have permission to create comments, so abuse would be almost impossible.

The implementation would be like this: Site setting require_application would be changed to an enum user_approval_mode: application|content_review. Tables local_user and comment each need a new column approved. If both local_user.approved and comment.approved are false, the comment is not publicly visible, otherwise its visible. There is a new endpoint ListPendingCommentApprovals, with identical format to ListCommentReports, so that the frontend can be reused. Most API calls get checks so that they can only be used by approved users (upload image, create community, send private message etc).

Nutomic avatar Dec 02 '22 10:12 Nutomic

I'm not sure if I agree with this, for several reasons:

  • It still requires manual admin approval, and would probably be even more work for admins, since now instead of reading through explicit question and answers, they have to gauge someones unapproved comment and post history.
    • Lets say we want to make sure someone isn't homophobic. If their first few comments are on unrelated topics, we don't have enough info to approve them. Contrast that with questionnaires, where we can have their opinion on record on any given number of questions before approving.
    • What if there's not enough comments to tell whether a user is genuine or not? At what point do admins decide to approve someone? If their first X comments look fine, that's no way to tell whether they are trolls or not, since they could just wait until they're approved.
  • This is a completely new and untested way of onboarding afaik. Masto, discord servers, and other platforms do it the application-questionnaire way. We settled on this after trying different things, and it has proven itself to be more effective than other methods, and has stood the test of time.
  • If one reason for this is to filter out bots, as opposed to trolls, this would be worse in that regard: now instead of one obvious bot application, which couldn't read the application questions, there are probably dozens of bot posts and comments.
  • The "approval notification via email" process would work the same.
  • Federation might get complicated, because we'd have to have to build and persist a queue of "newly approved but not yet sent" content. IE after an account gets set to "approve", their content needs to get added to a queue, instead of being called on the API endpoints.
  • It'd require a whole new queue and UI for "unapproved content".

A much better solution IMO, would be to have the default state where people after applying are "logged in", but not yet approved, with an indicator somewhere in the UI showing them that their status is pending. When they get approved, they get emailed, and the pending status goes away, but they don't have to log in again.

dessalines avatar Dec 02 '22 20:12 dessalines

I'm not sure if I agree with this, for several reasons:

  • It still requires manual admin approval, and would probably be even more work for admins, since now instead of reading through explicit question and answers, they have to gauge someones unapproved comment and post history.

  • Lets say we want to make sure someone isn't homophobic. If their first few comments are on unrelated topics, we don't have enough info to approve them. Contrast that with questionnaires, where we can have their opinion on record on any given number of questions before approving.

  • What if there's not enough comments to tell whether a user is genuine or not? At what point do admins decide to approve someone? If their first X comments look fine, that's no way to tell whether they are trolls or not, since they could just wait until they're approved.

This is meant as an alternative to the existing registration application functionality, not replace it. So instances can keep using that if they want a more thorough vetting of users, or want to ask certain questions like in your example.

The advantage of this approach is mainly for public servers, where admins dont really need to know anything about the user, except knowing that its not a bot. And what they do is only approve each individual comment, not the user account. The account gets approved automatically after a certain number of comments was approved.

  • This is a completely new and untested way of onboarding afaik. Masto, discord servers, and other platforms do it the application-questionnaire way. We settled on this after trying different things, and it has proven itself to be more effective than other methods, and has stood the test of time.

While its true that this can be bypassed, so can any other check. But the average spammer will not bother, and thats who we are trying to catch. It is also very similar to the way Discourse handles new users.

  • If one reason for this is to filter out bots, as opposed to trolls, this would be worse in that regard: now instead of one obvious bot application, which couldn't read the application questions, there are probably dozens of bot posts and comments.

If this turns out to be a problem, we could apply stricter rate limits to unapproved users, or show only one comment per user for review.

  • The "approval notification via email" process would work the same.

  • Federation might get complicated, because we'd have to have to build and persist a queue of "newly approved but not yet sent" content. IE after an account gets set to "approve", their content needs to get added to a queue, instead of being called on the API endpoints.

Just need to federate the comment after its approved, not difficult.

  • It'd require a whole new queue and UI for "unapproved content".

I would just make an sql query which buillds this queue on the fly from all unapproved comments. And the UI can be copied/abstracted directly from reports.

A much better solution IMO, would be to have the default state where people after applying are "logged in", but not yet approved, with an indicator somewhere in the UI showing them that their status is pending. When they get approved, they get emailed, and the pending status goes away, but they don't have to log in again.

It still means the user has to answer some questions during signup. Look at Reddit, Facebook or Twitter, none of them need that. They get you signed up and posting as soon as possible. Thats what im trying to enable for Lemmy.

Nutomic avatar Dec 02 '22 22:12 Nutomic

Look at Reddit, Facebook or Twitter, none of them need that.

All of them now require verified emails, which we also have, but its optional. I'd rather lemmy.ml have a policy of federating to either require_verified_email, or require_application_questionnaire servers.

I'm still mostly opposed to this because of

  • how complicated it is (both on the back and front end)
  • how untested this way of vetting users is
  • how much it increases moderation workload.

dessalines avatar Dec 09 '22 14:12 dessalines

Email verification isnt really useful against spam, I remember from hosting peertube.social that most spam bots had verified gmail addresses. With other providers its even easier for a bot to verify email automatically.

The application questionnaire is useful, but doesnt really work for instances which are meant to be public (again, look how easy signup is on Reddit, no need to answer any questions). I dont think it is complicated at all, I already finished half the backend. It can also hardly be considered untested, for example Discourse uses a similar system (though more complex). For admins who are concerned about increased workload, they can simply keep using one of the exsisting registration modes.

By the way this kind of feature is also being requested by users: https://lemmy.ml/post/632344

Nutomic avatar Dec 09 '22 15:12 Nutomic

Discourse unfortunately seems to be following the stack-overflow model, where you created arbitrary complicated systems of trust / reputation, which then affects what abilities you have. I greatly bet they also couple that with captcha's and verified emails, otherwise they would get a ton of spam signups and content.

You're underestimating how much work this will be. It would need:

  • Complicated rules around what these new unverified but logged in users can do. Can they comment and post, but not like and send private messages?
  • New columns for every table on whether that action is unverified, updates to every SQL trigger to check for unverified, as well as SQL views that show the unverified comments from you, but hide them from others. Alternately, join to the local_user table to check for verified.
  • An API to retrieve all unverified comments, posts, messages.
  • An API way to approve that user, then update all their content.
  • An API setting for this. Your PR considers them exclusive, but really it would have to be another type that exists alongside all the other verification options.
  • Yet another set of UI tools to interact with that API.
  • Federating all newly approved content.

There are so many simpler solutions to this problem. Like:

  • Allow unverified users to appear logged in, and show an indicator that they're not approved yet, that goes away when they get approved.
  • Allow unverified users to subscribe / unsubscribe only.

Shifting manual approval from before to after, isn't going to affect any server's ability to reach mass adoption IMO, especially if in the latter their comments are hidden.

dessalines avatar Dec 09 '22 17:12 dessalines

The proposed method is very common on classic forums (I use it on phpBB, Flarum and Discourse) and is pretty much the only way to reliably stop spam accounts.

Verifying by email seem to be fully automated these days for spam bots and does not help at all.

The admin approval of accounts also works, but spam users can lie more easily in the application form than they can on the actual posts. A spam-bot can easily put in random seemingly good answers in the application form, but it can not not post spam in the actual comments/posts as that would defeat the purpose of spamming. And all in all, I think it "hurts" legitimate users more than it does spammers.

poVoq avatar Dec 10 '22 15:12 poVoq

There are so many simpler solutions to this problem. Like:

* Allow unverified users to appear logged in, and show an indicator that they're not approved yet, that goes away when they get approved.

* Allow unverified users to subscribe / unsubscribe only.

That seems like a good step already if people can customize their profile and subscribe even if not approved yet.

Maybe there could be also a "draft" feature that doesn't actually post comments but with which any user can make and schedule posts/comments? I think that was a feature also previously requested and I think it exists on Mastodon as well. One of the scheduling options could be "post when account gets approved" then.

Edit: ofc. that would work better if approving admins are able to see these drafts, which isn't ideal as people probably consider unpublished drafts to be private.

poVoq avatar Dec 10 '22 15:12 poVoq

I know I may be making myself a little unpopular, but what would be wrong with Akismet, for example?
Another thing could be disabling attachment uploads directly after registration for a small amount of time or until a specific amount of "reputation". Same with the profile header image. We could see it as something like a reward system for participation?

Also, an idea could be a "need for approval" for image only posts, since most of the spammers I've seen, only post image spam. This could be automatically unlocked with an amount of "constructive" participation in other threads.

Edit: I love the idea of "Allow unverified users to subscribe / unsubscribe only.". Maybe it's possible to do this based on communities?

kromonos avatar Dec 10 '22 23:12 kromonos

I am one of the admins at beehaw.org and I'm looking forward to this feature. as @poVoq stated above:

The admin approval of accounts also works, but spam users can lie more easily in the application form than they can on the actual posts.

I just wanted to leave this general feedback for the sake of support/solidarity.

GitOffMyBack avatar Dec 10 '22 23:12 GitOffMyBack

The proposed method is very common on classic forums (I use it on phpBB, Flarum and Discourse) and is pretty much the only way to reliably stop spam accounts.

phpBB also uses an application questionnaire, just like masto, lemmy, and discord servers. They also have optional captcha and email verify like lemmy.

A spam-bot can easily put in random seemingly good answers in the application form, but it can not not post spam in the actual comments/posts as that would defeat the purpose of spamming.

Really confusing.. the entire purpose of a spambot is to spam their content all over your site.

The admin approval of accounts also works, but spam users can lie more easily in the application form than they can on the actual posts.

Then they'll just make a few innocuous posts at first. No different than making an innocous registration application, except now you've added more work for admins by having to go through a bunch of posts except for one application.

I'm very against adding rules around reputation, and limiting abilities based on that. Every person here so far came up with a different set of rules for what makes sense to them, and all of them are circumventable by malicious humans with time.

dessalines avatar Dec 11 '22 16:12 dessalines

Breaking this down, there are three types of bad actors:

  1. Spambots
  2. Humans with no time (trolls who want to post racist spam quickly)
  3. Humans with a lot of time

A questionnaire stops 1 and 2. It doesn't stop 3.

The "post approval method" stops 1 and 2, and also doesn't stop 3. It also creates a lot more work for admins, and is far more complicated.

dessalines avatar Dec 11 '22 16:12 dessalines

Sorry the double negation was a bit confusing indeed.

Interestingly enough on all the classic forums I have very rarely seen a spammer that first tries to post innocuous posts. They usually go for the spam directly. What I did see though was a seemingly innocuous user start a discussion to make the later spam posts by other users seem more relevant and less spammy (I think that is some sort of semi-automation).

As for the 3 scenarios: a questionnaire does not stop 1 and 2. Spambots can be easily automated to fill these questions with semi-random truthy answers that are difficult to spot by admins and a troll even without time can easily think of some answer similar in quality to what I have seen as answers from legitimate users on my instance.

poVoq avatar Dec 11 '22 16:12 poVoq

Spambots are easily stopped by application questions, its why us and masto use them. We get plenty of bot registrations, none of them can read the questions, parse meaning, and attempt to answer each of them individually in a coherent way.

You can also add things inside the text that bots can't easily do, like "Follow the instructions from this page" (and that page tells them to type something in all caps in their response.) We haven't found this necessary bc bots are simple to spot, but I've seen lots of discord servers do it.

troll even without time can easily think of some answer similar in quality to what I have seen as answers from legitimate users on my instance.

The point is that they don't get to instant-post, so even if they go through the trouble of making an application, they have to wait some time before they get approved. Which puts them in 3, not 2.

dessalines avatar Dec 11 '22 17:12 dessalines

Sure, however the point of this entire thread is that the current method might somewhat work, but it hurts legitimate users (and admins) and thus adoption more than it does the spammers.

But if you can implement what you proposed above to allow non-approved users to already subscribe to communities and so on I think that would help a lot already.

poVoq avatar Dec 11 '22 17:12 poVoq

We get plenty of bot registrations, none of them can read the questions, parse meaning, and attempt to answer each of them individually in a coherent way.

With the increasing coherence of chatbots (ChatGPT etc.), this sounds like something that will very soon be a losing battle.

SorteKanin avatar Apr 29 '23 20:04 SorteKanin

We get plenty of bot registrations, none of them can read the questions, parse meaning, and attempt to answer each of them individually in a coherent way.

With the increasing coherence of chatbots (ChatGPT etc.), this sounds like something that will very soon be a losing battle.

Still increases the costs. So for now it stops unsophisticated spambots and slows down sophisticated ones.

FruityWelsh avatar Dec 20 '23 16:12 FruityWelsh

Closing this due to the huge amount of complication it would add, but can be re-opened if someone wants to work on it.

dessalines avatar May 04 '24 15:05 dessalines

This should be done in plugins

dullbananas avatar May 04 '24 19:05 dullbananas