addons icon indicating copy to clipboard operation
addons copied to clipboard

Fix Userprofile.username charset db inconsistency

Open eviljeff opened this issue 6 years ago • 8 comments
trafficstars

Running migration 1201 resulted in the error stderr: b"ERROR 1062 (23000) at line 2: Duplicate entry 'Austin' for key 'username'\n seemingly because we are dropping the existing explicit column definition of CHARACTER SET utf8mb4 COLLATE utf8mb4_bin and replacing it with the table default.

Attempts to manually fix the data by renaming the "Austin" username to something else, resulted in a different error for the next username, and so on (I abandoned after 3 UPDATE to different records). To unblock the stage deploy I merged https://github.com/mozilla/addons-server/commit/d6cdd59175b649f02e767843f921a882bd9957e8 and cherry-picked to the https://github.com/mozilla/addons-server/tree/2019.09.19-1 tag.

We need to fix the inconsistency and reapply ALTER TABLE `users` CHANGE COLUMN `username` `username` VARCHAR (255) NOT NULL; to bring the stage&prod database to django norms.

┆Issue is synchronized with this Jira Task

eviljeff avatar Sep 18 '19 08:09 eviljeff

QA: Let's just test that this hasn't accidentally dropped existing usernames that were not dupes by loading known public collection URLs like https://addons-dev.allizom.org/en-US/firefox/collections/VictorQATester/testcollect/ and making sure they still work.

diox avatar Sep 18 '19 15:09 diox

Actually, the new migration initially didn't take on stage, it failed with: ERROR 1062 (23000) at line 7: Duplicate entry 'anonymous-83f2f3c3aefa9a786cee930e3572eba932' for key 'username'

The relevant SQL was this:

CREATE TEMPORARY TABLE `duplicate_usernames`
    SELECT LOWER(`username`) AS `username` FROM `users`
        GROUP BY LOWER(`username`) HAVING COUNT(*) > 1;
UPDATE `users`
    SET `username`=CONCAT('anonymous-', MD5(RAND()), 32)
    WHERE LOWER(`username`) != `username`
    AND LOWER(`username`) IN (SELECT `username` FROM `duplicate_usernames`);
ALTER TABLE `users` CHANGE COLUMN `username` `username` VARCHAR (255) NOT NULL;

Before running it, there were ~80k affected users on stage, twice that on prod.

We tried several things to fix it, ended up with replacing the CONCAT('anonymous-', MD5(RAND()), 32) with CONCAT('anonymous-', LOWER(HEX(RANDOM_BYTES(16))) but then the migration still had not finished after 8+ hours.

We're going to remove the migration from the tag, and after that we need to figure out what took time (is it the UPDATE or the ALTER that follows it ?). Then we've got 2 options, assuming the UPDATE is what's problematic:

  • Replace it with a different SQL that would be faster. I suspect the current version, while less likely to result in a duplicate entry than the previous one, might be exhausting entropy a little too quickly.
  • Replace it with some python code that would change the problematic usernames little by little outside of the migration process

That would still leave the ALTER part to be done though. I need to check if that's what the last deploy was stuck on or if it was stuck on the UPDATE.

In any case, a 3rd option is to drop the column entirely. That'd still be a big ALTER, and preparing that change would take time, but ultimately the column is almost useless nowadays, we only need it to support legacy profile/collection URLs, and we could do that from a limited set of usernames in a different table. See https://github.com/mozilla/addons/issues/5528

@eviljeff mentioned on IRC a fourth option I had forgotten: we could set the collation we currently use in prod via a django migration, and forget about this whole thing for now.

diox avatar Sep 19 '19 09:09 diox

(is it the UPDATE or the ALTER that follows it ?)

It's the UPDATE, and it's now finished. We don't know how much more than 7 hours it took, it was run by @bqbn manually at my request last night, and I can see now on stage that it's done:

MySQL [addons_allizom_org]> SELECT COUNT(*) FROM (SELECT LOWER(username) FROM users GROUP BY LOWER(username) HAVING COUNT(*) > 1) as lol;
+----------+
| COUNT(*) |
+----------+
|        0 |
+----------+

Stage was never redeployed once it finished, so the ALTER has not been run on stage yet, and we don't know how much time that'd take.

Edit: it took 9 hours on stage.

diox avatar Sep 19 '19 10:09 diox

summary of options:

  • add column collation via manual django migration
  • drop username column from model
  • delete usernames that conflict under default collation.

eviljeff avatar Oct 02 '19 16:10 eviljeff

And https://github.com/mozilla/addons/issues/6974 has the history of issues with option 3 (Tl;DR: we could go with the nuclear option and delete most non ascii usernames to solve the conflicts, or try a more subtle approach)

I'll look into option 1, see what it looks like, we've spent too much time on this already.

diox avatar Oct 02 '19 20:10 diox

Just make a note here that the users table has been altered concerning the username column in -dev environment, but has NOT in -stage and -prod environments.

bqbn avatar Oct 02 '19 21:10 bqbn

Do we have requests to URLs with usernames (instead of user id), and if so which ones, and how many ?

Let's look for requests matching these in the logs:

  • https://addons.mozilla.org/(\w{2,3}(?:-\w{2,6})?)/(firefox|android)/user/(.*?[^0-9].*)/.*
  • https://addons.mozilla.org/(\w{2,3}(?:-\w{2,6})?)/(firefox|android)/collections/(.*?[^0-9].*)/.*
  • https://addons.mozilla.org/api/(v\d)/accounts/account/(.*?[^0-9].*)/.*

About the regexps:

  • (\w{2,3}(?:-\w{2,6})?) matches locales, fr or en-US
  • (firefox|android) matches applications
  • (.*?[^0-9].*) matches usernames, which can contain anything, but can't be only digits
  • (v\d) matches api versions

There are technically more possibilities with the API (?user=... in some endpoints) but I don't think we care about those

diox avatar Jul 19 '22 11:07 diox

Old Jira Ticket: https://mozilla-hub.atlassian.net/browse/ADDSRV-18

KevinMind avatar May 03 '24 16:05 KevinMind