generated-wordlists icon indicating copy to clipboard operation
generated-wordlists copied to clipboard

Another criteria for a good passphrase word list

Open sirati opened this issue 10 months ago • 4 comments

Hi, i have read your blog article, but I have been wondering, would it work to make sure that all words in the word list to have an even edit distance or at least two so that as follows any one or sometimes more of such mistakes could be corrected:

  1. a random mistyped letter
  2. two letters in the word switching position
  • and maybe: homophones being considered the same word
  • plural and singular being considered the same word and either both or neither have to be removed in prefix elimination

a somewhat less restrictive criteria may be:

  1. one letter replaced with a neighboring key on common keyboard layouts as well as keys switched on common keyboard i.e. y and z and QWERTY vs QWERTZ (german) or q and a on QWERTY vs AWERTY (french)
  2. two neighboring letter switched
  3. a three neighboring letter permutation were the letter switched in location would be typed with the other hand
  4. as an extension to 1: completely typed with the wrong keyboard. i.e. typing on dvorak as if it were qwerty, but not allowing any further errors 1 to 3 if that would make the edit distance uneven.

would such a (prefix-removed) list still have enough entropy while not containing to uncommon / weird words?

Personally i don't quite care about prefix removal as much, as a good passphrase login mechanism would always rather query an N words passphrase. I especially think it actually is a flaw that your current list removes most singular as they are prefixes of plurals. I would much rather have all plurals removed, as is it does make the generated phrases weird because of all the plurals AND you are not quite achieving your goal of prefix removal for the purpose of avoiding the need of a separator as in this case the english plural's "s" does act as a separator for all regular plurals of prefixes. However, prefix removal could also be context dependent, if a prefix occurs the prefix is redrawn

lastly, another thought I have had: The ordering and forcing uniqueness of all words does not decrease entropy that much. It would be possible to accept any ordering of the the correct passphrase and when generating it to the USER to use a !not! fine-tined, small-enough to run locally GPT to perform a search which word ordering is most likely i.e. most memorable

sirati avatar Feb 24 '25 13:02 sirati

Thank you for your interest!

I think you raise 3 separate issues here. (1) Preventing neighboring keys; (2) Prefix removal preferring plurals; and (3) re-ordering words for increased memorability.

Two contexts RE (1), I've come to understand that there are 2 distinct situations or contexts in which passphrases are used.

One case is as a password to, say, a website or vault. This is a simple case were we maximize entropy and memorability. The website/vault/program has no expectation of which word list the user used to generate their passphrase. A general passphrase generator, like my Phraze, is an example.

The second case is a bit more subtle: It's where the website or program the user is typing the passphrase into can safely assume/know the word list that the user's passphrase came from. An example of this is a file-transfer tools like croc and Magic Wormhole. Here, developers of programs can (safely) use handy tricks like letting users auto-complete words after a few characters, a nice convenience for users. This is also where edit distance specifically and typo prevention generally come into play. I argue that it is only in this context can a developer safely display a message like "Hey, you typed 'wren' -- did you mean 'when'?" (Interestingly, I see that it might be debatable how important it is to have a prefix-free wordlist in this context, even if developers don't require word separators.)

Personally, I'm more interested in the first, more general context. As a creator of word lists, I think it's better to create more generally useful lists. But your question has clarified my thinking about these two contexts a bit.

Removing prefix words gives preference to plurals I agree that removing prefix words tends to leave more plurals on the list. My only answer is to ask whether "tables" is any less memorable than "table", and to point out that some of my word lists have suffix words removed, or use my Schlinkert Pruning technique to make then uniquely decodable.

I have debated trying to add a feature to my word-list creating tool, Tidy, that would remove plurals, but I haven't yet.

Re-ordering words to maximize memorability This is an interesting idea I hadn't thought of! If one was careful about the decrease in entropy such "massaging" of the resulting passphrase would cause, I think this would be safe and interesting. Reminds me of the acrostic idea, where the user can pick a word and then each letter of their resulting passphrase will start with a letter of the word (this program has that option). That said, if I understand you correctly, this is more of a suggestion for a passphrase generator tool, not a word list or word list-making tool.

The idea of using local LLMs is also something I hadn't thought of. I'm not an expert of, frankly, a big fan of that technology in general, but it's an interesting idea!

i have read your blog article

This Reddit post and subsequent discussion may be helpful too!

sts10 avatar Feb 24 '25 18:02 sts10

You complete got me correctly. Indeed splitting this into separate issues would be appropriate.

You are also right that I got carried away a bit and some of my thoughts are rather not appropriate for this repository but instead useful for a pass-phrase generator and verifier tool. (As you see by the edit history haha)

I may, if I find the time, implied a rust library and utility tool for this purpose.

Anyway, I am glad to have gotten it off my mind and were able to inspire a bit

sirati avatar Feb 24 '25 21:02 sirati

The loss of entropy by allowing any ordering is some but not as much that it wouldn't be compensated by adding another word and I think for 6 words even with the loss it is quite secure enough imho.

$$ \text{Entropy Loss} = \log_2(6!) ≈ 9.5bits $$

sirati avatar Feb 24 '25 21:02 sirati

You are also right that I got carried away a bit and some of my thoughts are rather not appropriate for this repository but instead useful for a pass-phrase generator and verifier tool.

No worries! When making word lists, it's useful to anticipate the needs of passphrase generator creators.

I may, if I find the time, implied a rust library and utility tool for this purpose.

I already mentioned Tidy and Phraze, but you may also want to look over my other passphrase-related Rust projects like Word List Auditor, remote words, and one on homophones.

sts10 avatar Feb 24 '25 21:02 sts10