matrix-appservice-gitter
matrix-appservice-gitter copied to clipboard
Clean up non-casefolded users
More of an admin task than a code one.
Write enough code (even temporary) to allow us to puppet ghost users from commandline. Kill each ghost user with capital letters in their name.
This will require:
- Have them "leave" all Matrix rooms they're in
- Delete the user from the UserStore
This now needs more thought. There's many users in capitalised name form out in the wild now.
I think I'll attack this by:
- Write a small amount of code to casefold new users while preserving the case of existing capitalised users, letting their accounts continue to function in the short-term
- Write a larger amount of cleanup code that will slowly clean up these old user accounts, by provisioning a new user in a lowercase form and having the old capitalised one leave all its existing rooms, next time the user actually speaks
- Possibly at some later time, run this block of cleanup code in bulk against any remaining users after a while.
Up to stage 1 should be fairly transparent to most users - existing users won't change, and the worst that happens is people start to notice that a few old users have capitalised IDs and most new ones don't. Not the end of the world.
Problem is that without stages 2+3 we won't be able to delete a bunch of now-legacy code that's hanging around purely to support these legacy old users with the wrong names. Would be nice to run it all and clean that code up, but then this makes a certain amount of disruption. Not massive, but could be annoying.
Part 1 is now done (branch https://github.com/matrix-org/matrix-appservice-gitter/tree/paul/cleanup-uppercase) and running live, so at least new users are being created correctly. Next I'll make a count of how many existing users need fixing up.
As a reminder to myself as much as anything:
New users all have the mxid_localpart database field that they use; legacy users who may be incorrectly cased don't.
Current counts:
All users: 14352
New users: 10141
Old users - all lowercase: 2930
- has upper: 1281
I've now run a fixup script to add a mxid_localpart field to all the users with an all-lowercase gitter name, thus fixing those up. All that now remains are the 1281 users with uppercase letters.
Bridge is now running code that checks when it loads a user from the DB if the case needs fixing up; if so it removes the old-cased username from all its old rooms then proceeds to create a new user account correctly.
https://github.com/matrix-org/matrix-appservice-gitter/commit/4bdaae12397b8cf8babae992885cc8faa8841d5f
Given a week or so this should clean up most of the currently-active users. Then we can do a bulk edit of the remaining.
Having been running now for ~24 hours, the count is down to 1229; having done a mere 52 users out of 1281. I suspect it'll be a while yet at this rate.
I suspect what also might help this along is to do some membership list syncing. If we purge all the matrix-side ghosts of gitter users no longer in gitter rooms, that might clean up a few more users who we'd find are now no longer in any rooms at all. We can then remove those ones from the database. Depends on #21
1197
1178