matrix-appservice-gitter icon indicating copy to clipboard operation
matrix-appservice-gitter copied to clipboard

Clean up non-casefolded users

Open leonerd opened this issue 9 years ago • 9 comments

More of an admin task than a code one.

Write enough code (even temporary) to allow us to puppet ghost users from commandline. Kill each ghost user with capital letters in their name.

This will require:

  • Have them "leave" all Matrix rooms they're in
  • Delete the user from the UserStore

leonerd avatar Aug 01 '16 17:08 leonerd

This now needs more thought. There's many users in capitalised name form out in the wild now.

I think I'll attack this by:

  1. Write a small amount of code to casefold new users while preserving the case of existing capitalised users, letting their accounts continue to function in the short-term
  2. Write a larger amount of cleanup code that will slowly clean up these old user accounts, by provisioning a new user in a lowercase form and having the old capitalised one leave all its existing rooms, next time the user actually speaks
  3. Possibly at some later time, run this block of cleanup code in bulk against any remaining users after a while.

Up to stage 1 should be fairly transparent to most users - existing users won't change, and the worst that happens is people start to notice that a few old users have capitalised IDs and most new ones don't. Not the end of the world.

Problem is that without stages 2+3 we won't be able to delete a bunch of now-legacy code that's hanging around purely to support these legacy old users with the wrong names. Would be nice to run it all and clean that code up, but then this makes a certain amount of disruption. Not massive, but could be annoying.

leonerd avatar Sep 29 '16 12:09 leonerd

Part 1 is now done (branch https://github.com/matrix-org/matrix-appservice-gitter/tree/paul/cleanup-uppercase) and running live, so at least new users are being created correctly. Next I'll make a count of how many existing users need fixing up.

leonerd avatar Sep 30 '16 11:09 leonerd

As a reminder to myself as much as anything:

New users all have the mxid_localpart database field that they use; legacy users who may be incorrectly cased don't.

Current counts:

All users: 14352
New users: 10141
Old users - all lowercase: 2930
          - has upper:     1281

leonerd avatar Nov 14 '16 17:11 leonerd

I've now run a fixup script to add a mxid_localpart field to all the users with an all-lowercase gitter name, thus fixing those up. All that now remains are the 1281 users with uppercase letters.

leonerd avatar Nov 15 '16 15:11 leonerd

Bridge is now running code that checks when it loads a user from the DB if the case needs fixing up; if so it removes the old-cased username from all its old rooms then proceeds to create a new user account correctly.

https://github.com/matrix-org/matrix-appservice-gitter/commit/4bdaae12397b8cf8babae992885cc8faa8841d5f

Given a week or so this should clean up most of the currently-active users. Then we can do a bulk edit of the remaining.

leonerd avatar Nov 15 '16 18:11 leonerd

Having been running now for ~24 hours, the count is down to 1229; having done a mere 52 users out of 1281. I suspect it'll be a while yet at this rate.

leonerd avatar Nov 16 '16 20:11 leonerd

I suspect what also might help this along is to do some membership list syncing. If we purge all the matrix-side ghosts of gitter users no longer in gitter rooms, that might clean up a few more users who we'd find are now no longer in any rooms at all. We can then remove those ones from the database. Depends on #21

leonerd avatar Nov 21 '16 12:11 leonerd

1197

leonerd avatar Nov 23 '16 19:11 leonerd

1178

leonerd avatar Dec 06 '16 15:12 leonerd