xous-core icon indicating copy to clipboard operation
xous-core copied to clipboard

vault basis entries access visibility loss

Open eau-u4f opened this issue 2 years ago • 71 comments

I just lost everything without reason, 2 PDDB basis open in vault, I've been using them for weeks, yesterday I decide I'll do a backup because there is a lot I did not duplicate and have setup but it's late, I wake up in the morning (I had not unlocked the pddb basis in vault) the list is just empty, I reboot, reopen the bases, same, empty... Please tell me I can recover that...

That happened just after I noticed a week (or so) ago that some passwords crossed from one basis to another... (misunderstanding of the basis concept that bunnie cleared).

Any suggestion or is it definitely lost?

eau-u4f avatar Nov 01 '22 08:11 eau-u4f

oof, that is distressing.

You said you can open the secret Bases. when you open them, there's no error posted?

A couple things to try:

From shellchat, try running pddb dictlist to see the list of dictionaries available, and then do pddb keylist to see the keys inside the database. So for example there should be a vault.passwords dictionary, and you'll see a bunch of keys that are a hex number that are the hash of the passwords.

Do this before opening the secret basis, and then open the secret basis and see if the union view of the key set changes (basically just check the key list again). You can unlock a secret basis from shellchat using pddb basisunlock with the basis name as the argument to the command.

One possibility is that the keys are there, but, they aren't being read in correctly by vault. If this is the case, there would probably be some spew on the console log complaining about deserialization errors...

You said you did a backup though, right? and you are using a fairly recent version of the tool, so it would have forced you to create the backup headers.

If you feel comfortable, you can drop it into the backalyzer and see if the keys and data are still there. The backalyzer can decrypt everything, but you do have to type in your passwords to the script. If you're working with a "live" dataset what I do is I have a microSD card that i put into a raspberry Pi for doing the analysis, and then I take it out when I'm done and store it somewhere safe. The backalyzer will read your backup, and you give it as arguments your boot pin and then any secret bases and you can ask it to dump the data to the console (it'll be decrypted plaintext of everything). If there are any structural errors in the PDDB, it'll also pick them up, probably just throwing a Python script error. The good news is there's a reasonable chance you can just brute force through the sectors to look for any data that went missing -- the PDDB is structured as a journal, so, there is a good chance there's an old version of a journaled page hanging out with previous data in it.

So, I think there are a bunch of things to try, but it's a bit fiddly because they are your secrets. The easiest thing to do would be to say Ok send me the data and the passwords and I'll analyze it but of course, then I'd see a lot of private data. So, we'll have to debug this on a slightly slower pace. But, if you have a recent backup done with the latest backup tools, I think the prognosis should be pretty good.

bunnie avatar Nov 01 '22 10:11 bunnie

Another option is, you can just run a backup right now, so that you have a snapshot of the journals as-is; this will reduce the chance that you lose data because some free blocks are re-used by accident. The backup is a read-only process so it should be fairly safe to run.

Again, make sure you run "prepare backup..." before running the backup. If you just run a backup without preparing it first, you will backup the ciphertext of the PDDB but no keys. The keys are prepared every single time. The latest main prevents that from happening, and if you're running from the bleeding-edge build it also protects the backups with a checksum to ensure there are no USB errors reading it out.

bunnie avatar Nov 01 '22 10:11 bunnie

FYI my internet connectivity is a bit spotty at the moment so I may not be instantly responsive for the next 24 hours or so, but with a bit of patience I think we have a few things to try. I'd say first thing is to run a backup. If you know how to use hexdump to inspect, do a hexdump -C backup.pddb and make sure that there is data at offset 0x0-0x1000. The failure mode is that you have 0xFF there, because you didn't prepare the backup before running it. The first ~1k is a backup of your root key box, plus some metadata. There will be some blank data in that sector, but it's just the unused space at the end. I'm asking you to check because I had one other user fail to prepare the backup before running, and because of that the root keys were lost and we couldn't do any analysis at all of the backup file. As long as the root keys are backed up I think we have a lot of things to try.

bunnie avatar Nov 01 '22 10:11 bunnie

Sorry I expressed myself incorrectly, i was ABOUT to do a backup (so i did NOT do a backup yet) when this happened and I could not access some of my services since the basis were empty.

Ok, I have rebooted a couple of times, I would unlock the basis in the vault app, and nothing would appear, no entries. I have rebooted a couple more times, but this time i have unlocked the basis from the shellchat and dictlist is NOT empty, which means there is still data, I unlocked 2 of my most important basis.

using pddb query dict:key i can see the content, so that data is there, after I did that I switched over to vault and the entries are back but in a completely different basis (while I was trying to test).

So I think I can recover the most important datas...at least I hope...

I did C, a bit of assembly and python in the past, I never programmed in rust, so I will start with this project, so I can TRY to contribute by debugging and identifying better what is happening (my day work sucks all my time :so i'll be slow...)

I will rename the issue since it seems the data are still there, somewhere, but I have no idea why vault was/is not able to display.

I have an error on ONE of the basis now, but not on the others ('ill comment further to tell you the error, currently recovering my data first)

eau-u4f avatar Nov 01 '22 11:11 eau-u4f

Ok NOW, I've my data back, I'm not living in fear and I can do experiments without being scared. I unlocked the basis I've opened through shellchat, try to reopen the one that displayed an error, I have the following:

Dictionnary error accessing U2F database
Custom { kind: Other, error: "internal error" }

And the entries are empty, I've add this error in the past but not always and the entries would appear. Also other basis have no errors, but still do not show up. IMG_7144

eau-u4f avatar Nov 01 '22 11:11 eau-u4f

I'll run the backup now and use backalyzer to start analyzing and seeing how I recover all the basis. thanks for being so responsive! as said earlier, I'll learn rust basics to be able to read vault code, as now I am completely useless to try to identify and rust is highly unusual for someone with C and Go as last experiences..

eau-u4f avatar Nov 01 '22 11:11 eau-u4f

OK phew the data is at least still there. So now the trick is to reproduce the sequence of event that lead to a basis appearing empty. My guess is the function that does the "union of all open Bases" may have errorred-out early and just returned an empty set.

Anyways, it sounds like the error is triggered after a particular Basis is opened. So, it sounds like maybe one of the Bases got corrupted, and that is causing the error-out during the unionization of the keys. Can you describe, in generic terms, how I might be able to reproduce the sequence of events that lead to the corruption of this Basis?

I promise to dig more into this later this week, tomorrow I'm on the road quite a bit so I may not have great internet but at least I can sleep a little easier knowing it wasn't a total loss.

Please make the backup now, I think it's a good bulwark against any future errors. The backups have saved my skin during dev and testing with the system in a pretty bad state, so I have pretty good confidence in them, especially now that there are extra sanity checks on the backup integrity (which are present if you're using the latest main).

bunnie avatar Nov 01 '22 13:11 bunnie

Ok yes i'll describe my daily behavior using the device.

As pddb, i wonder if a pddb integrity or consistency check as a command would be nice, not knowing the format i can not think or advice on anything, but as a user i wonder if there is a possibility to recover from basis corruption.

Yes backup now and i need to document also what you explained me (in the other issue) for the basis "overlays" to clarify my usage patterns and the data representation based on my usage and avoid transfer of datas or misunderstandings..

I ll comment further here.

eau-u4f avatar Nov 01 '22 14:11 eau-u4f

I m using 0.9.8 still.

eau-u4f avatar Nov 01 '22 14:11 eau-u4f

ooooh 0.9.8 ok. You should still be able to do a backup and have it work but the blocks won't be checksummed. Just be sure to run "prepare backup..." first, The latest backup tool will prevent you from running backup without doing that first but we learned the hard way that users can run a backup without preparing them in that version of the OS.

Since 0.9.8, the following PDDB bugs have been fixed:

  • a journaling issue in the free space blocks (this is very possibly linked to your issue)
  • a cache coherency bug syncing the PDDB (rare and hard to trigger, this fix is only in bleeding edge and still testing)
  • some free space cache re-generation issues -- very unlikely because the auto-retrigger to generate the FSCB wasn't introduced until 0.9.10 i think
  • checksums now to protect the backups and also the backup script has basic checks to ensure the full backup image was downloaded correctly.

You should run a backup (preparing it first) with 0.9.8, but after you've run a backup and confirmed the first kilobyte of your backup image has something in it using the hexdump command, I would recommend then updating to the latest release and running another backup (saving the previous file) so you have one with checksums, in addition to your original backup. Just as a sanity check.

0.9.11 (in CI now) has some fixes to the cache coherency issue on the CPU but it's pretty rare and hard to trigger. It usually manifests as a panic, but does not result in data loss, so I don't think this is going to affect you. I stop short of saying you should grab the CI image right now because there's a lot of activity on main at this very moment and I would want to do a test of the CI state on my own device before recommending you absorb it.

bunnie avatar Nov 01 '22 14:11 bunnie

here's an example of what the first few hundred bytes of your backup should look like :

$ hexdump -C backup.pddb | less

00000000  01 00 01 00 00 00 09 00  0a 00 02 00 1e 5d 25 6f  |.............]%o|
00000010  01 00 00 00 00 00 09 00  0a 00 00 00 38 40 09 c8  |............8@..|
00000020  01 00 00 00 00 00 09 00  09 00 03 00 00 00 00 00  |................|
00000030  00 00 00 00 03 00 0c 00  03 00 00 00 00 00 00 00  |................|
00000040  00 00 00 00 00 00 00 00  f4 2c d6 be 83 01 00 00  |.........,......|
00000050  00 00 00 00 01 00 00 00  5c c8 58 54 ce b5 4c 00  |........\.XT..L.|
00000060  00 01 00 00 62 00 00 00  00 10 00 00 00 00 00 00  |....b...........|
00000070  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000090  01 00 00 00 00 00 00 00  3c 18 19 26 d7 17 cc 48  |........<..&...H|
000000a0  4e b9 cf 7b 5f 1a 06 fc  86 91 6e d1 db 22 92 3a  |N..{_.....n..".:|
000000b0  fc 11 55 bf 00 b3 ac a8  84 b5 a8 30 63 90 9b 94  |..U........0c...|
000000c0  1b b0 f0 2b 58 6f 0a 83  44 8a f1 13 f5 1b 49 75  |...+Xo..D.....Iu|

The exact bytes will differ due to encryption but the general structure should be somewhat similar.

The failure mode is if you just see FF FF FF FF or 00 00 00 00 from addresses 0-0x1000. this means the header wasn't prepared and your keys were not exported.

bunnie avatar Nov 01 '22 14:11 bunnie

hopefully you're able to run the backup, at the very least. Let me know if you need any more assistance recovering data.

bunnie avatar Nov 04 '22 08:11 bunnie

was swamped in work stuff.. now doing my backup and check + xous upgrade, sorry about that, i ll keep you updated on this issue.

eau-u4f avatar Nov 04 '22 09:11 eau-u4f

ok so i'll summarize:

  1. I've manually backup all entries I've recovered (mainly TOTP entries and password entries)
  2. I've done a factory reset of the device using the latest stable (0.9.9, so i m up-to-date now, hopefully i won't need and put backups regularly)
  3. re-setup root pass, PIN, wifi, etc..
  4. system PDDB has entries..

Comment:

  • in https://github.com/betrusted-io/betrusted-wiki/wiki/Updating-Your-Device (requirements misses pycryptodome, i'm running on a macOS here)

I managed to get all my data so it's not as scary.

Now one question regarding the backup preparation: why is the BIP39 backup key have such low entropy? abandon abandon abandon..... I guess I missed something in my setup...?

eau-u4f avatar Nov 04 '22 11:11 eau-u4f

freshly resetted device, first basis I create: image

Right away, running 0.9.9 what do i do wrong?

eau-u4f avatar Nov 04 '22 12:11 eau-u4f

Now one question regarding the backup preparation: why is the BIP39 backup key have such low entropy? abandon abandon abandon..... I guess I missed something in my setup...?

The default eFuse key for your device is 0x0000... which is the abandon abandon ... art in BIP 39.

BBRAM keys need to be burned to transform from the 0x0 key to something non-trivial. The eFuse flow is still pending. But basically, there is an additional step of device fusing that has to be done to lock that part down.

bunnie avatar Nov 04 '22 12:11 bunnie

freshly resetted device, first basis I create: image

Right away, running 0.9.9 what do i do wrong?

I've seen this error too, I haven't been able to trace it down yet because it's not consistently reproducible for me. Can you share with me the steps you did to cause that?

I did find a bug earlier today in the basis unionizing code, which was pushed to the bleeding edge CI branch, but it affected something in the passwords feature, not the FIDO side.

bunnie avatar Nov 04 '22 12:11 bunnie

(The problem I have is I don't have enough live U2F entries to really cause anything to happen, and the synthetic cases I have don't seem to trigger it.)

bunnie avatar Nov 04 '22 12:11 bunnie

Steps are:

  1. Factory reset
  2. Setup 2 wifi network (populate the system)
  3. Did my backup.py (twice the first time it failed as i did not have py rytodome)
  4. Switched to vault app
  5. Create a first basis: test
  6. answered yes to mount it
  7. Retyped the basis password, seems mounted then that popup came right after.

I have no entries yet, i was starting to rethink my basis based on our conversation around the overlays and my misunderstandings.

That's it.

eau-u4f avatar Nov 04 '22 12:11 eau-u4f

ok great

Wait you ran the backup after you did the reset?...I hope you backed up the data before you reset things as well...

bunnie avatar Nov 04 '22 12:11 bunnie

I did manually as i explained earlier, but then my root password was lost in my memory and my passwords were all recovered manually from pddb query commands, so i decided to just factory reset to experiment and play with basis and backup as well as rethink how i store and backup things properly

Does that make sense?

eau-u4f avatar Nov 04 '22 12:11 eau-u4f

Ah ok, so basically, you were able to manually extract the passwords before doing a factory reset. whew.

Alright. Thank you. And thank you for reporting these issues. afaik you're the only other user who is using the Basis feature other than me, so unfortunately you're hitting edge cases that I haven't seen yet, or I failed to cover in my test benches. Thank you for your patience.

bunnie avatar Nov 04 '22 12:11 bunnie

Man, thank YOU for your this great initiative and work, i love the idea to be able to verify and trust my hw a few layers below OS. Yes i want to use it daily so i need to feel comfy and trusted with it.

eau-u4f avatar Nov 04 '22 12:11 eau-u4f

Yes i want to use it daily so i need to feel comfy and trusted with it.

Me too. We'll get there, but it's a process.

I just tried the process you outlined and didn't get the error. Can you type ver xous just to let me know what version you're running?

bunnie avatar Nov 05 '22 08:11 bunnie

v0.9.10-13-g709ff8f9

image

eau-u4f avatar Nov 05 '22 08:11 eau-u4f

hmm i thought it was 0.9.9 since i expected the latest stable, as I ran tools/legacy/factory_reset.sh -s

eau-u4f avatar Nov 05 '22 08:11 eau-u4f

ah ok, 0.9.10 is the latest stable release. hmm....there were some fixes since then that could have addressed this issue which is maybe why i'm not seeing it. I'll try again loading the older version later.

bunnie avatar Nov 05 '22 09:11 bunnie

OK. I went back to the version you're reporting, and confirmed I see that problem. So it was resolved in the bleeding-edge release.

bunnie avatar Nov 06 '22 07:11 bunnie

I think it was fixed with this commit: https://github.com/betrusted-io/xous-core/commit/5de44a95023fbfc0c4954038f7d988841da38b87

bunnie avatar Nov 06 '22 07:11 bunnie

should i upgrade to bleeding edge ?

eau-u4f avatar Nov 06 '22 10:11 eau-u4f