labrinth icon indicating copy to clipboard operation
labrinth copied to clipboard

More human-friendly backup codes for 2FA

Open brawaru opened this issue 1 year ago • 3 comments

Is your suggested enhancement related to a problem? Please describe.

When setting up 2FA, Modrinth provides you with a list of 6 backup codes. Typically those codes are meant to be used when you lost your authenticator, so you would likely want to just write them off on some kind of paper or download as a file, however, Modrinth backup codes are passwords. They're pretty difficult to write by hand. Not only they're 11 characters in length and alphanumeric, they also consist of uppercase and lowercase characters, and have no separators.

Here's an example of existing codes:

  • 47qiZVkXxfE
  • EbZlmqtkqmU
  • FYf7hj0Km2A
  • CU5hWULzf7K
  • KO8N5YPeFgk
  • 8f7ax3DGVVo

Describe the solution you'd like

Backup codes should remain codes, not passwords. Therefore, they should be easy to write on a piece of paper. I propose generating at least 8 codes, each code having two parts separated by a dash character, and each part consisting of alphanumeric characters in uppercase (A-Z, 0-9), although when codes are accepted, the case shouldn't matter, neither the presence of a dash separator.

Example:

  • ZU36-YVEU
  • 1T31-8173
  • K0NX-0265
  • M788-7T36
  • Q947-6970
  • J53Y-5IY7
  • RDQ0-1D95
  • Q070-4I46

These codes can easily be generated with the following JS code, which is likely portable to Rust:

function getRandomValue() {
  const arr = crypto.getRandomValues(new Uint32Array(2));

  return ((arr[0] * 0x100000) + (arr[1] >>> 12)) / 0x10_000_000_000_000;
}

const englishA = 'A'.charCodeAt(0)
const letterChance = 0.5

function backupCode(partsCount = 2, partLength = 4) {
  let code = ''
  
  partsCount = Math.max(partsCount, 0)
  partLength = Math.max(partLength, 0)

  for (let i = 0; i < partsCount; i++) {
    for (let j = 0; j < partLength; j++) {
      if (getRandomValue() > letterChance) {
        code += String.fromCharCode(englishA + Math.floor(getRandomValue() * 26))
      } else {
        code += Math.floor(getRandomValue() * 10)
      }
    }

    if ((i + 1) !== partsCount) code += '-'
  }

  return code
}
Test

{
  const standardCodeRegExp = /^[A-Z0-9]{4}-[A-Z0-9]{4}$/
  const charStats = new Map();
  
  console.log('Running test on 100,000 generations')
  for (let i = 0; i < 100_000; i++) {
    const code = backupCode()
    
    if (!standardCodeRegExp.test(code)) {
      throw new Error(`Illegal code: ${JSON.stringify(code)}`)
    }

    for (const char of code.split('')) {
      if (char === '-') continue
      charStats.set(char, (charStats.get(char) ?? 0) + 1)
    }
  }

  if (!charStats.has('A')) {
    throw new Error('A never generated, are offsets wrong?')
  }

  if (!charStats.has('Z')) {
    throw new Error('Z never generated, are offsets wrong?')
  }

  console.log('Test complete.')

  const sortedStats = [...charStats.entries()].sort(([, countA], [, countB]) => countB - countA)

  console.log('=== DISTRIBUTION ===')
  let i = 0;
  for (const [char, count] of sortedStats) {
    console.log(`${++i}. ${char} - ${count}`)
  }
}

Result:

Running test on 100,000 generations 
=== DISTRIBUTION === 
1. 0 - 40151 
2. 1 - 40054 
3. 6 - 39986 
4. 3 - 39938 
5. 5 - 39934
6. 9 - 39926 
7. 7 - 39904 
8. 4 - 39835 
9. 2 - 39812 
10. 8 - 39737 
11. R - 15771 
12. C - 15696 
13. F - 15609 
14. P - 15604 
15. D - 15546 
16. M - 15538 
17. G - 15486 
18. V - 15473 
19. K - 15452 
20. S - 15451 
21. Y - 15429 
22. H - 15418 
23. E - 15401 
24. W - 15389 
25. J - 15388 
26. O - 15387 
27. U - 15376 
28. Q - 15369 
29. I - 15359 
30. L - 15351 
31. N - 15308 
32. B - 15290 
33. X - 15242 
34. A - 15155 
35. T - 15121 
36. Z - 15114

Describe alternatives you've considered

  • Seed phrases, similar to those used in crypto currencies to encode wallet recovery code. They encode a seed using one of the dictionaries (thus they can be translatable as well), and you get a phrase like betray forest execute sock bridge build, which can be turned back into a seed by the machine. However, this is a lot to write for a single code, and we need only a few one-time use codes, so I don't think is a good idea compared to the proposal above.

Additional context

N/A

brawaru avatar Apr 20 '24 09:04 brawaru

I did an attempt at this, the biggest pain with implementing this is that the database expects a number as backup code. The code generating itself is simple.

pub fn generate_2fa_backup_code<R: rand::RngCore>(rng: &mut R) -> String {
    use rand::distributions::{Distribution, Uniform};
    let valid_backup_code_chars: Vec<char> =
        "ABCDEFGHIJKLMNOPQRSTUVWXYZ123456789".chars().collect();
    let char_dist = Uniform::from(0..valid_backup_code_chars.len());

    let mut code: Vec<char> = (0..8)
        .map(|_| valid_backup_code_chars[char_dist.sample(rng)])
        .collect();

    code.insert(4, '-');
    code.iter().collect()
}

I also believe that while we are on it, we should just remove 0 as well, since it could be mistaken as O on certain fonts.

piprett avatar Aug 12 '24 19:08 piprett

For such short codes I think removing letters would be detrimental to security, otherwise we could've used Microsoft's product key alphabet, which also avoids the possibility of getting a code like B0OB-1CEF. We probably should use monospace font that differentiates between these characters when showing the backup codes. And 8 codes should be enough to help avoiding getting stuck in case of mistyping (0 and O, Z and 7, I and 1, B and 8, etc).

brawaru avatar Aug 12 '24 21:08 brawaru

Name Alphabet Characters Maximum time to guess assuming no downtime[^1]
A-Z 0-9 ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789 37 2784 years
A-Z 1-9 ABCDEFGHIJKLMNOPQRSTUVWXYZ123456789 36 2236 years
A-Z 0-9 without 0, 7, 1, 8 ABCDEFGHIJKLMNOPQRSTUVWXYZ234569 33 1114 years
Microsoft BCDFGHJKMPQRTVWXY2346789 25 120 years

[^1]: Assuming a rate limit of 300 attempts per minute, with 8 codes per person. Years = X ^ 8 / 8 / 300 / 60 / 24 / 365.

piprett avatar Aug 13 '24 07:08 piprett