xous-core
xous-core copied to clipboard
vault basis entries access visibility loss
I just lost everything without reason, 2 PDDB basis open in vault, I've been using them for weeks, yesterday I decide I'll do a backup because there is a lot I did not duplicate and have setup but it's late, I wake up in the morning (I had not unlocked the pddb basis in vault) the list is just empty, I reboot, reopen the bases, same, empty... Please tell me I can recover that...
That happened just after I noticed a week (or so) ago that some passwords crossed from one basis to another... (misunderstanding of the basis concept that bunnie cleared).
Any suggestion or is it definitely lost?
oof, that is distressing.
You said you can open the secret Bases. when you open them, there's no error posted?
A couple things to try:
From shellchat, try running pddb dictlist
to see the list of dictionaries available, and then do pddb keylist
to see the keys inside the database. So for example there should be a vault.passwords
dictionary, and you'll see a bunch of keys that are a hex number that are the hash of the passwords.
Do this before opening the secret basis, and then open the secret basis and see if the union view of the key set changes (basically just check the key list again). You can unlock a secret basis from shellchat using pddb basisunlock
with the basis name as the argument to the command.
One possibility is that the keys are there, but, they aren't being read in correctly by vault. If this is the case, there would probably be some spew on the console log complaining about deserialization errors...
You said you did a backup though, right? and you are using a fairly recent version of the tool, so it would have forced you to create the backup headers.
If you feel comfortable, you can drop it into the backalyzer and see if the keys and data are still there. The backalyzer can decrypt everything, but you do have to type in your passwords to the script. If you're working with a "live" dataset what I do is I have a microSD card that i put into a raspberry Pi for doing the analysis, and then I take it out when I'm done and store it somewhere safe. The backalyzer will read your backup, and you give it as arguments your boot pin and then any secret bases and you can ask it to dump the data to the console (it'll be decrypted plaintext of everything). If there are any structural errors in the PDDB, it'll also pick them up, probably just throwing a Python script error. The good news is there's a reasonable chance you can just brute force through the sectors to look for any data that went missing -- the PDDB is structured as a journal, so, there is a good chance there's an old version of a journaled page hanging out with previous data in it.
So, I think there are a bunch of things to try, but it's a bit fiddly because they are your secrets. The easiest thing to do would be to say Ok send me the data and the passwords and I'll analyze it but of course, then I'd see a lot of private data. So, we'll have to debug this on a slightly slower pace. But, if you have a recent backup done with the latest backup tools, I think the prognosis should be pretty good.
Another option is, you can just run a backup right now, so that you have a snapshot of the journals as-is; this will reduce the chance that you lose data because some free blocks are re-used by accident. The backup is a read-only process so it should be fairly safe to run.
Again, make sure you run "prepare backup..." before running the backup. If you just run a backup without preparing it first, you will backup the ciphertext of the PDDB but no keys. The keys are prepared every single time. The latest main
prevents that from happening, and if you're running from the bleeding-edge build it also protects the backups with a checksum to ensure there are no USB errors reading it out.
FYI my internet connectivity is a bit spotty at the moment so I may not be instantly responsive for the next 24 hours or so, but with a bit of patience I think we have a few things to try. I'd say first thing is to run a backup. If you know how to use hexdump
to inspect, do a hexdump -C backup.pddb
and make sure that there is data at offset 0x0-0x1000. The failure mode is that you have 0xFF there, because you didn't prepare the backup before running it. The first ~1k is a backup of your root key box, plus some metadata. There will be some blank data in that sector, but it's just the unused space at the end. I'm asking you to check because I had one other user fail to prepare the backup before running, and because of that the root keys were lost and we couldn't do any analysis at all of the backup file. As long as the root keys are backed up I think we have a lot of things to try.
Sorry I expressed myself incorrectly, i was ABOUT to do a backup (so i did NOT do a backup yet) when this happened and I could not access some of my services since the basis were empty.
Ok, I have rebooted a couple of times, I would unlock the basis in the vault app, and nothing would appear, no entries. I have rebooted a couple more times, but this time i have unlocked the basis from the shellchat and dictlist is NOT empty, which means there is still data, I unlocked 2 of my most important basis.
using pddb query dict:key i can see the content, so that data is there, after I did that I switched over to vault and the entries are back but in a completely different basis (while I was trying to test).
So I think I can recover the most important datas...at least I hope...
I did C, a bit of assembly and python in the past, I never programmed in rust, so I will start with this project, so I can TRY to contribute by debugging and identifying better what is happening (my day work sucks all my time :so i'll be slow...)
I will rename the issue since it seems the data are still there, somewhere, but I have no idea why vault was/is not able to display.
I have an error on ONE of the basis now, but not on the others ('ill comment further to tell you the error, currently recovering my data first)
Ok NOW, I've my data back, I'm not living in fear and I can do experiments without being scared. I unlocked the basis I've opened through shellchat, try to reopen the one that displayed an error, I have the following:
Dictionnary error accessing U2F database
Custom { kind: Other, error: "internal error" }
And the entries are empty, I've add this error in the past but not always and the entries would appear.
Also other basis have no errors, but still do not show up.
I'll run the backup now and use backalyzer to start analyzing and seeing how I recover all the basis. thanks for being so responsive! as said earlier, I'll learn rust basics to be able to read vault code, as now I am completely useless to try to identify and rust is highly unusual for someone with C and Go as last experiences..
OK phew the data is at least still there. So now the trick is to reproduce the sequence of event that lead to a basis appearing empty. My guess is the function that does the "union of all open Bases" may have errorred-out early and just returned an empty set.
Anyways, it sounds like the error is triggered after a particular Basis is opened. So, it sounds like maybe one of the Bases got corrupted, and that is causing the error-out during the unionization of the keys. Can you describe, in generic terms, how I might be able to reproduce the sequence of events that lead to the corruption of this Basis?
I promise to dig more into this later this week, tomorrow I'm on the road quite a bit so I may not have great internet but at least I can sleep a little easier knowing it wasn't a total loss.
Please make the backup now, I think it's a good bulwark against any future errors. The backups have saved my skin during dev and testing with the system in a pretty bad state, so I have pretty good confidence in them, especially now that there are extra sanity checks on the backup integrity (which are present if you're using the latest main
).
Ok yes i'll describe my daily behavior using the device.
As pddb, i wonder if a pddb integrity or consistency check as a command would be nice, not knowing the format i can not think or advice on anything, but as a user i wonder if there is a possibility to recover from basis corruption.
Yes backup now and i need to document also what you explained me (in the other issue) for the basis "overlays" to clarify my usage patterns and the data representation based on my usage and avoid transfer of datas or misunderstandings..
I ll comment further here.
I m using 0.9.8 still.
ooooh 0.9.8 ok. You should still be able to do a backup and have it work but the blocks won't be checksummed. Just be sure to run "prepare backup..." first, The latest backup tool will prevent you from running backup without doing that first but we learned the hard way that users can run a backup without preparing them in that version of the OS.
Since 0.9.8, the following PDDB bugs have been fixed:
- a journaling issue in the free space blocks (this is very possibly linked to your issue)
- a cache coherency bug syncing the PDDB (rare and hard to trigger, this fix is only in bleeding edge and still testing)
- some free space cache re-generation issues -- very unlikely because the auto-retrigger to generate the FSCB wasn't introduced until 0.9.10 i think
- checksums now to protect the backups and also the backup script has basic checks to ensure the full backup image was downloaded correctly.
You should run a backup (preparing it first) with 0.9.8, but after you've run a backup and confirmed the first kilobyte of your backup image has something in it using the hexdump
command, I would recommend then updating to the latest release and running another backup (saving the previous file) so you have one with checksums, in addition to your original backup. Just as a sanity check.
0.9.11 (in CI now) has some fixes to the cache coherency issue on the CPU but it's pretty rare and hard to trigger. It usually manifests as a panic, but does not result in data loss, so I don't think this is going to affect you. I stop short of saying you should grab the CI image right now because there's a lot of activity on main
at this very moment and I would want to do a test of the CI state on my own device before recommending you absorb it.
here's an example of what the first few hundred bytes of your backup should look like :
$ hexdump -C backup.pddb | less
00000000 01 00 01 00 00 00 09 00 0a 00 02 00 1e 5d 25 6f |.............]%o|
00000010 01 00 00 00 00 00 09 00 0a 00 00 00 38 40 09 c8 |............8@..|
00000020 01 00 00 00 00 00 09 00 09 00 03 00 00 00 00 00 |................|
00000030 00 00 00 00 03 00 0c 00 03 00 00 00 00 00 00 00 |................|
00000040 00 00 00 00 00 00 00 00 f4 2c d6 be 83 01 00 00 |.........,......|
00000050 00 00 00 00 01 00 00 00 5c c8 58 54 ce b5 4c 00 |........\.XT..L.|
00000060 00 01 00 00 62 00 00 00 00 10 00 00 00 00 00 00 |....b...........|
00000070 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00000090 01 00 00 00 00 00 00 00 3c 18 19 26 d7 17 cc 48 |........<..&...H|
000000a0 4e b9 cf 7b 5f 1a 06 fc 86 91 6e d1 db 22 92 3a |N..{_.....n..".:|
000000b0 fc 11 55 bf 00 b3 ac a8 84 b5 a8 30 63 90 9b 94 |..U........0c...|
000000c0 1b b0 f0 2b 58 6f 0a 83 44 8a f1 13 f5 1b 49 75 |...+Xo..D.....Iu|
The exact bytes will differ due to encryption but the general structure should be somewhat similar.
The failure mode is if you just see FF FF FF FF
or 00 00 00 00
from addresses 0-0x1000. this means the header wasn't prepared and your keys were not exported.
hopefully you're able to run the backup, at the very least. Let me know if you need any more assistance recovering data.
was swamped in work stuff.. now doing my backup and check + xous upgrade, sorry about that, i ll keep you updated on this issue.
ok so i'll summarize:
- I've manually backup all entries I've recovered (mainly TOTP entries and password entries)
- I've done a factory reset of the device using the latest stable (0.9.9, so i m up-to-date now, hopefully i won't need and put backups regularly)
- re-setup root pass, PIN, wifi, etc..
- system PDDB has entries..
Comment:
- in https://github.com/betrusted-io/betrusted-wiki/wiki/Updating-Your-Device (requirements misses pycryptodome, i'm running on a macOS here)
I managed to get all my data so it's not as scary.
Now one question regarding the backup preparation: why is the BIP39 backup key have such low entropy? abandon abandon abandon..... I guess I missed something in my setup...?
freshly resetted device, first basis I create:
Right away, running 0.9.9 what do i do wrong?
Now one question regarding the backup preparation: why is the BIP39 backup key have such low entropy? abandon abandon abandon..... I guess I missed something in my setup...?
The default eFuse key for your device is 0x0000... which is the abandon abandon ... art in BIP 39.
BBRAM keys need to be burned to transform from the 0x0 key to something non-trivial. The eFuse flow is still pending. But basically, there is an additional step of device fusing that has to be done to lock that part down.
freshly resetted device, first basis I create:
Right away, running 0.9.9 what do i do wrong?
I've seen this error too, I haven't been able to trace it down yet because it's not consistently reproducible for me. Can you share with me the steps you did to cause that?
I did find a bug earlier today in the basis unionizing code, which was pushed to the bleeding edge CI branch, but it affected something in the passwords feature, not the FIDO side.
(The problem I have is I don't have enough live U2F entries to really cause anything to happen, and the synthetic cases I have don't seem to trigger it.)
Steps are:
- Factory reset
- Setup 2 wifi network (populate the system)
- Did my backup.py (twice the first time it failed as i did not have py rytodome)
- Switched to vault app
- Create a first basis: test
- answered yes to mount it
- Retyped the basis password, seems mounted then that popup came right after.
I have no entries yet, i was starting to rethink my basis based on our conversation around the overlays and my misunderstandings.
That's it.
ok great
Wait you ran the backup after you did the reset?...I hope you backed up the data before you reset things as well...
I did manually as i explained earlier, but then my root password was lost in my memory and my passwords were all recovered manually from pddb query commands, so i decided to just factory reset to experiment and play with basis and backup as well as rethink how i store and backup things properly
Does that make sense?
Ah ok, so basically, you were able to manually extract the passwords before doing a factory reset. whew.
Alright. Thank you. And thank you for reporting these issues. afaik you're the only other user who is using the Basis feature other than me, so unfortunately you're hitting edge cases that I haven't seen yet, or I failed to cover in my test benches. Thank you for your patience.
Man, thank YOU for your this great initiative and work, i love the idea to be able to verify and trust my hw a few layers below OS. Yes i want to use it daily so i need to feel comfy and trusted with it.
Yes i want to use it daily so i need to feel comfy and trusted with it.
Me too. We'll get there, but it's a process.
I just tried the process you outlined and didn't get the error. Can you type ver xous
just to let me know what version you're running?
v0.9.10-13-g709ff8f9
hmm i thought it was 0.9.9 since i expected the latest stable, as I ran tools/legacy/factory_reset.sh -s
ah ok, 0.9.10 is the latest stable release. hmm....there were some fixes since then that could have addressed this issue which is maybe why i'm not seeing it. I'll try again loading the older version later.
OK. I went back to the version you're reporting, and confirmed I see that problem. So it was resolved in the bleeding-edge release.
I think it was fixed with this commit: https://github.com/betrusted-io/xous-core/commit/5de44a95023fbfc0c4954038f7d988841da38b87
should i upgrade to bleeding edge ?