webpassgen icon indicating copy to clipboard operation
webpassgen copied to clipboard

feature: element count

Open roycewilliams opened this issue 2 years ago • 1 comments

Along with the bits of entropy, and the character count, It might be informative to include a compact expression of how many "elements" there were in the source list ("7777 elements").

This could help shape intuitions for the layperson.

(There may be another term that is better than "element". I led with that because some of the lists are words, some are pseudo words, some are characters, etc.)

roycewilliams avatar Aug 03 '22 00:08 roycewilliams

This is something I toyed with early on but abandoned as I just couldn't get it working the way I wanted. Maybe I can give it another go. But instead of putting the password statistics in the generation box itself which is already feeling busy to me, putting it in an overlay by clicking/tapping a link and clicking/taping to dismiss.

These are just mock-ups made in GIMP. No code to support it has been written yet.

alternate-propsed alternate-propsed-overlay

atoponce avatar Aug 05 '22 01:08 atoponce

I've implemented this for the "Alternate", "Cryptocurrency", "Diceware", "EFF", and "Random" generators. However, implementing it for the "Pseudowords" generator is escaping me.

In the case of passphrases, it's easy enough to count the number of unique words (although Diceware NLP passphrases are combined of adjective-noun pairs, meh). In the case of random passwords, it's easy enough to count the number of unique characters (or emoji).

However, in the case of pseudowords, it's super tricky. Do I count the number of possible individual pseudoword blocks? For example, with Apple, it's built with the following requirements:

  • Each block is 6 lowercase characters in length of "cvccvc", where "c" is one of 19 consonants and "v" is one of 6 vowels".
  • A consonant at either end of the block is randomly chosen to be replaced with a random digit of 0-9.
    • Only strictly one digit replaces a block-end consonant, even if there are 2 or more blocks.
  • A remaining consonant or vowel is chosen randomly to be capitalized.
    • Only strictly one character is capitalized even if there are 2 ore more blocks.

So for Apple, valid pseudowords structures could look like:

  • DVccvc
  • cvCcvD-cvccvc
  • cvcCvc-Dvccvc-cvccvc
  • cvccVc-cvccvD-cvccvc-cvccvc

Apple is the extreme case in complexity. Here's the issues with the other pseudoword tepmlates:

  • Bubble Babble:
    • "xvcvc-cvcvc-...-cvcvc-cvcvx"
    • 6 vowels, 17 consonants
    • "x" is static at both ends.
    • Structure includes a built-in checksum.
  • Daefen:
    • 6 vowels, 18 consonants
    • Complex structure of "vc", cv", cvv", "cvc", "vcv".
    • Guarantees exactly 16-bits per pseudoword.
  • Koremutake:
    • 128 unique syllables
  • Lepron:
    • 36 "start" consonant pairs (72 total), 36 "middle" consonant pairs (72 total), 36 "end" consonant pairs (72 total), and a complex vowel structure (15 total)
    • Structure is "start+vowel+middle+vowel+middle+vowel+end"
  • Letterblock Diceware:
    • First gets "dice" rolls from a 6x6 table of letters.
    • Then finds the highest score from the dice rolls using a weighted table of 676 English bigrams.
    • Continue until the we've reached the minimum security margin.
    • Structure includes a built-in checksum
  • Munemo:
    • Encoded signed integers accurate to the exact bit.
    • Uses 100 unique syllables to encode the integer.
    • Instead of a hyphenated list of smaller pseudowords, one giant pseudoword is generated.
    • Is negative if the pseudoword starts with "xa" (not part of the 100 unique syllables)
  • Proquints:
    • "cvcvc" structure
    • 4 vowels and 16 consonants
    • Each pseudoword encodes exactly 16 bits
  • Urbit:
    • "cvccvc" structure of "prefix + suffix"
    • 256 unique 3-character prefixes and 256 unique 3-character suffixes
    • Each pseudoword encodes exactly 16 bits

So yeah, this is completely escaping me on how to portray this in the overlay. Open to ideas.

atoponce avatar May 21 '23 23:05 atoponce

How is entropy being calculated for these trickier cases? Naively, if entropy can be expressed, then element count must necessarily be expressible.

roycewilliams avatar May 22 '23 02:05 roycewilliams

Lots of wacky math. Ignoring a lot of the fluff, here's the math for the Apple pseudoword generator:

function generateApple() {
  /*
    See https://web.archive.org/web/20210430183515/https://twitter.com/AaronToponce/status/1131406726069084160 for full analysis.

    For n ≥ 1 blocks, the entropy in bits per block is:
      log2(
        (6n - 1)      //  One lowercase alphabetic character is randomly capitalized
        * 19^(4n - 1) //  The total possible combinations of consonants
        * 6^(2n)      //  The total possible combinations of vowels
        * 10 * 2n     //  An 'edge' character is a random digit
      )

    E.G.:
      DVccvc:                      log2( 5 * 19^3  * 6^2 * 10 * 2) ~=  24.558 bits
      cvCcvD-cvccvc:               log2(11 * 19^7  * 6^4 * 10 * 4) ~=  48.857 bits
      cvcCvc-Dvccvc-cvccvc:        log2(17 * 19^11 * 6^6 * 10 * 6) ~=  72.231 bits
      cvccVc-cvccvD-cvccvc-cvccvc: log2(23 * 19^15 * 6^8 * 10 * 8) ~=  95.244 bits
      et cetera, et cetera, et cetera.
  */
  var apple = function (n) {
    return Math.floor(Math.log2((6 * n - 1) * 19 ** (4 * n - 1) * 6 ** (2 * n) * 20 * n))
  }

  const entropy = getEntropy()
  let n = 1

  while (apple(n) < entropy) {
    n++
  }
  
  // more code...
}

So 1 block has ~24.558 bits of entropy, but 2 blocks has ~48.857 bits, a difference of 24.299 bits, not 24.558. The difference between 2 blocks and 3 blocks is 23.374 bits, and between 3 blocks and 4 blocks is 23.013 bits, etc. So with Apple, it's not a constant difference as the number of blocks grows. So I can't say the set size is 2^24.558 as that's not correct.

The other generators have similar-but-different nuances with their pseudoword blocks. Some however, like Daefen, Proquints, and Urbit, do have a constant factor per pseudoword. But even then, how do I say that? "2^16 syllables"? "65,536 syllables"? What about Munemo? It doesn't have a syllable structure, and instead is just an encoded random number. So if you generate an 80-bit pseudoword password, do I say "2^80 numbers"? Something else?

Maybe instead of communicating a "set size", I communicate something entirely different, although I don't know what.

atoponce avatar May 22 '23 02:05 atoponce

Ah, interesting! OK, for now, I propose just not populating it at all in the cases where it's not clear how to do so. Release early and often! :D

roycewilliams avatar May 22 '23 03:05 roycewilliams

Yup. Thought about that also. I can put the most of structure in place, just in case I figure something out, but not actually display anything in the overlay. At least that's not hurting anything, and if nothing comes to fruition, I can always just remove it.

But I still have plenty of bugs to work out before this is ready anyway, so I can keep thinking on it, and maybe something will come to mind.

atoponce avatar May 22 '23 03:05 atoponce

@roycewilliams Please review this and let me know your thoughts.

atoponce avatar May 22 '23 17:05 atoponce