apfs.ksy
apfs.ksy copied to clipboard
Support for reading snapshot metadata (WIP)
Many unknown values, still investigating.
- unknown_0 = 1548
- unknown_8 = 1570
- unknown_16 = timestamp for event?
- unknown_24 = same timestamp
- unknown_32 = 185
- unknown_40 = 0x40000002 (this thing again)
- unknown_44 = 0x00000000
So, have you figured out how to locate the snapshots? Is there an anchor you can find, with its own btree? Or are snapshots just mixed-in with the main tree, and with different tags or versions?
They're present in the same omap tree alongside all the other versions, so you get, for instance, a (oid=1026,xid=67) mapped to one block and a (oid=1026,xid=81) mapped to another block.
So essentially, all my existing logic for locating even the most recent copy of objects is wrong if there are any snapshots present, so I'm glad I tried testing this. I had a TODO on the code where I'm mapping oid to block number about the version number being important, and it's more important than I thought.
I just don't know...how to do the lookup efficiently without just reading it all into a giant hashmap up-front.
(The naïve implementation is: read all the rows, and take the highest xid which is not higher than the xid of the snapshot. How many entries can be in an omap table anyway? :))
Have you had a look at the apfs-fuse code? Maybe you can learn some tricks from it
Solving part of the confusion - omap_key doesn't actually contain oid - the oid is in the header already parsed.
Actually, in my own model, the omap keys aren't mixed in with the file table keys, but I've been unable to figure out whether they should or should not be. What I do know is that history keys have the exact opposite structure of omap keys, meaning that under the structure currently in this ksy file, both of those would be kind=0x0.
After removing the oid
from omap_key
, the table looks like:
0 [NodeEntry]: (OMAP) #1028 ID v67 → Blk 1561, len 4096
1 [NodeEntry]: (OMAP) #1028 ID v83 → Blk 1819, len 4096
2 [NodeEntry]: (OMAP) #1030 ID v67 → Blk 1569, len 4096
3 [NodeEntry]: (OMAP) #1030 ID v82 → Blk 1789, len 4096
4 [NodeEntry]: (OMAP) #1031 ID v67 → Blk 1558, len 4096
5 [NodeEntry]: (OMAP) #1031 ID v83 → Blk 1821, len 4096
6 [NodeEntry]: (OMAP) #1033 ID v64 → Blk 1532, len 4096
7 [NodeEntry]: (OMAP) #1033 ID v83 → Blk 1816, len 4096
8 [NodeEntry]: (OMAP) #1034 ID v60 → Blk 1487, len 4096
9 [NodeEntry]: (OMAP) #1034 ID v83 → Blk 1825, len 4096
10 [NodeEntry]: (OMAP) #1035 ID v67 → Blk 1568, len 4096
11 [NodeEntry]: (OMAP) #1035 ID v84 → Blk 1834, len 4096
12 [NodeEntry]: (OMAP) #1037 ID v67 → Blk 1567, len 4096
13 [NodeEntry]: (OMAP) #1037 ID v84 → Blk 1828, len 4096
14 [NodeEntry]: (OMAP) #1038 ID v67 → Blk 1566, len 4096
15 [NodeEntry]: (OMAP) #1038 ID v73 → Blk 1697, len 4096
16 [NodeEntry]: (OMAP) #1041 ID v67 → Blk 1562, len 4096
17 [NodeEntry]: (OMAP) #1041 ID v83 → Blk 1820, len 4096
So all the oid values are grouped together and the xids are in numerical order, meaning you can still binary search if that helps performance.
The snapshot metadata table is a bit of a weird one though.
* 0 [NodeEntry]: (SNAPSHOT_INFO) #68 -> TODO "com.apple.TimeMachine.2018-05-08-140522"
* keyOffset = 0x0 = 0
* keyLength = 0x8 = 8
* dataOffset = 0x5A = 90
* dataLength = 0x5A = 90
* keyHdr [KeyHdr]: (SNAPSHOT_INFO) #68
* keyLow = 0x44 = 68
* keyHigh = 0x10000000 = 268435456
* objId = 0x44 = 68
* kind = SNAPSHOT_INFO (0x1 = 1)
* key [EmptyKey]
* val [SnapshotInfoVal]: TODO "com.apple.TimeMachine.2018-05-08-140522"
* unknown8 = 0x622 = 1570
* unknown16 = 0x152C92ED860EC800 = 1525755922625775600
* unknown24 = 0x152C92ED860EC800 = 1525755922625775600
* unknown32 = 0xB9 = 185
* unknown40 = 0x40000002 = 1073741826
* unknown44 = 0x0 = 0
* nameLength = 0x28 = 40
* name = com.apple.TimeMachine.2018-05-08-140522
* 1 [NodeEntry]: (SNAPSHOT_NAME) #4563402750 "com.apple.TimeMachine.2018-05-08-140522" -> #68
* keyOffset = 0x8 = 8
* keyLength = 0x32 = 50
* dataOffset = 0x62 = 98
* dataLength = 0x8 = 8
* keyHdr [KeyHdr]: (SNAPSHOT_NAME) #4563402750
* keyLow = 0xFFFFFFFF = 4294967295
* keyHigh = 0xBFFFFFFF = 3221225471
* objId = 0x10FFFFFFE = 4563402750
* kind = SNAPSHOT_NAME (0xB = 11)
* key [SnapshotNameKey]: "com.apple.TimeMachine.2018-05-08-140522"
* nameLength = 0x28 = 40
* name = com.apple.TimeMachine.2018-05-08-140522
* val [SnapshotNameVal]: #68
* snapshotId = 0x44 = 68
What's weird is this:
* keyLow = 0xFFFFFFFF = 4294967295
* keyHigh = 0xBFFFFFFF = 3221225471
* objId = 0x10FFFFFFE = 4563402750
I would have thought that objId
would be 0x0FFFFFFFFFFFFFFF
.
If I change the calculation to this:
value: key_low | ((key_high & 0x0FFFFFFF) << 32)
Now I get -1, even though the top 4 bits wouldn't be set. :) I know JavaScript is bad at arithmetic, but I somehow thought that kaitai IDE would have to have worked around that in order to do anything correctly at all.
More stuff that is semi-known:
-
unknown_0
is a reference to some other block which is always present but which I haven't found any purpose for reading yet. -
unknown_20
if interpreted as a block number points at a HISTORY root node. I wasn't sure whether that made sense for volumes or not.
I am now working on updating my code to use the latest ksy files. Is this pull request still valid, or have later changed superseded this?
I did some refactoring so that these PRs got conflicts and I never tested them, but they might still be valid.