monero
monero copied to clipboard
Bug: it's theoretically possible (seems *very* unlikely) to be unable to restore a valid seed
How to see the bug
Take the following valid Spanish seed as an example:
seed
pluma pluma pluma pluma pluma pluma pluma pluma pluma pluma pluma pluma pluma pluma pluma pluma pluma pluma pluma pluma pluma pluma pluma pluma pluma
spend key
b4050000b4050000b4050000b4050000b4050000b4050000b4050000b4050000
It's theoretically possible (obviously extremely unlikely) for a wallet to generate the above seed.
When restoring this seed, it restores to the English seed:
seed
plus plus plus plus plus plus plus plus plus plus plus plus plus plus plus plus plus plus plus plus plus plus plus plus plus
spend key
3b0400003b0400003b0400003b0400003b0400003b0400003b0400003b040000
Why it happens
The wallet tries to infer what language the seed is in. It does this by checking the first X letters of each seed word against each language mnemonic dictionary. For English, only the first 3 letters of each seed word are checked. For Spanish, only the first 4 letters of each seed word are checked.
The first 3 letters of pluma
-> plu
is the same as plus
-> plu
.
The wallet checks if a seed is English before it checks if the seed is Spanish. When checking if the seed is English, the wallet trims each word to the first 3 letters plu
, sees plu
is in the trimmed English mnemonic dictionary, then calculates the seed's checksum using the trimmed words and sees the checksum is valid. The wallet then infers the seed's language is the first valid checksum. Thus, in this case, the checksum passes for English before the wallet can even check if the checksum is valid in Spanish, and so the wallet incorrectly thinks the seed is the English seed plus plus ... plus
.
Proposed solutions
Potentially disruptive but optimal solution
On restore, require the user specify the seed language along with the seed.
EDIT: I had previously proposed requiring the complete full seed (rather than trimmed), but as @iamamyth pointed out below, this would not solve the issue because there are duplicate full words across languages (e.g. audio
in English and Spanish).
Less disruptive solution
Only require the language specification in the event a given seed has multiple valid checksums across different languages.
This means the edge case is applied to some already likely very niche users in exceptionally rare circumstances.
Misc.
Related: #1579 https://github.com/serai-dex/serai/pull/490
You could also just score the language options based on similarity (exact match counts may suffice, though of course edit distances also work). So, for this example, even if a user typos pluma a few times, you'd compare ~20 "pluma" exact matches (score of 20) to ~0 "plus" exact matches and Spanish would win.
To clarify: While I think the probability of a wrong inference can (probably should) be easily reduced without altering the existing UX (see prior suggestion), it also seems appropriate to let users select the language. Requiring an exact seed match seems unnecessary to fix the scenario outlined here, though would also mitigate other unlikely typo situations.
I do not support fuzzy matching. Seeds should either have a valid result or be invalid. Probablistically selecting results, or yielding multiple results, will lead to unexpected/undesirable behavior.
The known solutions are:
- Banning trimmed seeds (trimmed referring to partial words, not the whitespace cleanup)
- Requiring the users select the language
- Re-selecting the word lists
I most prefer 1. It doesn't complicate the UX. The migration path for anyone with a trimmed seed would simply be to manually re-expand it before typing it in.
Probablistically selecting results, or yielding multiple results, will lead to unexpected/undesirable behavior.
I think this is a fair argument.
- Banning trimmed seeds (trimmed referring to partial words, not the whitespace cleanup)
- Requiring the users select the language
I figure requiring a user select a language when inputting a seed is a more significant UX change than requiring full words, since most people I'm guessing probably already use full words. As @iamamyth mentioned, supporting partial seed words also helps in the event someone misspells some words, though personally I think this can be smoothly and reasonably handled with error detection / word suggestion UX similar to Feather's.
Feather has both FWIW (both requires full words and also supports a language picker for the seed, though the bug is still present in Feather as well).
- Re-selecting the word lists
We need a solution that would correctly restore existing seeds, like the pluma
seed in the example above
Agreed re-selection is infeasible.
supporting partial seed words also helps in the event someone misspells some words, though personally I think this can be smoothly and reasonably handled with error detection / word suggestion UX similar to Feather's
Seems like a question of separation of concerns. At the lowest layer, exact matching would be appropriate (though note that, if any two dictionaries overlap in a single word, that would imply you need to specify a language to disambiguate). At higher layers, word suggestion could rely on a very simple probabilitstic algorithm (e.g. on the 0th word, show suggestions ordered by a fixed language probability; on the Nth word, N > 0, order by (exact match count, default language probability)
).
If any two dictionaries share a word, then not allowing trimmed seeds means a seed may have multiple outcomes and forces language-based selection.
(though note that, if any two dictionaries overlap in a single word, that would imply you need to specify a language to disambiguate)
Good spot here. I checked and there are a number of duplicate words across languages.
You are correct. This means users must specify a language, and requiring full words is not a solution. Trimmed words can still be allowed.
Looks like the optimal solution is therefore removing find_seed_language
from the code which cannot yield a valid result in all cases, and requiring the seed language as an input parameter to the words_to_bytes
functions (and to WalletImpl::recover
).
I can submit a PR for the CLI, RPC, and GUI wallets. Unless I'm missing something above, this would be a breaking change for wallets using the wallet API in the ecosystem worth communicating as well.
Worth noting: there is an additional UX thorn with requiring specifying a language. There's both an English and EnglishOld, and there are duplicate words across both (e.g. goodbye
). Users would therefore need to have the EnglishOld option available in addition to English, which is fairly confusing.
Users would therefore need to have the EnglishOld option available in addition to English, which is fairly confusing
Probably the easiest option would be to datestamp it (looks like that wordlist hasn't been available for many years, ~2014; someone could just figure out the exact year and the wallet could list it as, say, pre-2015, which most people would ignore).
Is it worth it to make wallet restore more complicated to all users for a case that is more theoretical / extremely unlikely? Monero is already more complicated compared to other coins due to the restore height – adding another selection feels like going in the wrong direction :/
English (pre-2015), English sound fine to me.
@selsta If this ever comes up IRL, users will be unable to restore their wallet and lose access to their funds. I don't believe specifying a language is a notable blocker. It's a single query which should have an obvious answer to any human.
Wallets can also try to abstract it, by decoding with all languages, and only asking the user if/when multiple valid options show up ("Is this seed Spanish or Italian?").
Probably the easiest option would be to datestamp it
This is a good solution for EnglishOld.
Wallets can also try to abstract it, by decoding with all languages, and only asking the user if/when multiple valid options show up ("Is this seed Spanish or Italian?").
I'll try this approach in GUI, CLI, RPC so it will only affect the UX for edge case seeds, unless there are objections/any more thoughts come along.
I generally agree there should be a solution in place in the rare event someone runs into the issue.
Wallets can also try to abstract it, by decoding with all languages, and only asking the user if/when multiple valid options show up ("Is this seed Spanish or Italian?").
this makes the most sense, if there are multiples then ask the user to clarify which language the seed is. If for some reason it doesn’t restore the correct wallet they can simply pick the next language asked