realm-core
realm-core copied to clipboard
Decryption failed - page zero has wrong checksum
How frequently does the bug occur?
Seen once
Description
A customer of my app reported suddenly being unable to launch my app. It terminates on first access of the database and it turns out that it is broken for some reason.
It might have broken during a realm migration, but this is uncertain. Newly created files work just fine. I might possibly be allowed to share the db file to a developer for analysis in private, but not in public. I tried to open it with Realm Studio as well and also tried upgrading to Realm 10.25.1, but the file still cannot be decrypted.
Stacktrace & log output
libc++abi: terminating with uncaught exception of type realm::util::DecryptionFailed: Decryption failed
Exception backtrace:
0 Realm 0x000000010b0d349b _ZN5realm4util16DecryptionFailedC2Ev + 107
1 Realm 0x000000010b0b9987 _ZN5realm4util10AESCryptor4readEixPcm + 519
2 Realm 0x000000010b0ba63e _ZN5realm4util20EncryptedFileMapping12refresh_pageEm + 110
3 Realm 0x000000010b0bafee _ZN5realm4util20EncryptedFileMapping12read_barrierEPKvmPFmPKcE + 126
4 Realm 0x000000010ab8e250 _ZN5realm4util26do_encryption_read_barrierEPKvmPFmPKcEPNS0_20EncryptedFileMappingE + 64
5 Realm 0x000000010b0a1822 _ZN5realm11StringIndexC2EmPNS_11ArrayParentEmRKNS_13ClusterColumnERNS_9AllocatorE + 338
6 Realm 0x000000010b08a6b0 _ZN5realm5Table23refresh_index_accessorsEv + 608
7 Realm 0x000000010af533c7 _ZN5realm5Group21create_table_accessorEm + 871
8 Realm 0x000000010af53006 _ZN5realm5Group12do_get_tableEm + 102
9 Realm 0x000000010b1e6287 _ZN5realm12ObjectSchemaC2ERKNS_5GroupENS_10StringDataENS_8TableKeyE + 391
10 Realm 0x000000010b1f0194 _ZN5realm11ObjectStore17schema_from_groupERKNS_5GroupE + 132
11 Realm 0x000000010b2594bb _ZN5realm5Realm32read_schema_from_group_if_neededEv + 187
12 Realm 0x000000010b259268 _ZN5realm5RealmC2ENS0_6ConfigENS_4util8OptionalINS_9VersionIDEEENSt3__110shared_ptrINS_5_impl16RealmCoordinatorEEENS0_13MakeSharedTagE + 456
13 Realm 0x000000010b1b7c2c _ZN5realm5Realm17make_shared_realmENS0_6ConfigENS_4util8OptionalINS_9VersionIDEEENSt3__110shared_ptrINS_5_impl16RealmCoordinatorEEE + 220
14 Realm 0x000000010b1b6294 _ZN5realm5_impl16RealmCoordinator12do_get_realmENS_5Realm6ConfigERNSt3__110shared_ptrIS2_EENS_4util8OptionalINS_9VersionIDEEERNS8_17CheckedUniqueLockE + 532
15 Realm 0x000000010b1b5eaf _ZN5realm5_impl16RealmCoordinator9get_realmENS_5Realm6ConfigENS_4util8OptionalINS_9VersionIDEEE + 495
16 Realm 0x000000010b259ce7 _ZN5realm5Realm16get_shared_realmENS0_6ConfigE + 135
17 Realm 0x000000010ae4d71a +[RLMRealm realmWithConfiguration:queue:error:] + 2314
18 RealmSwift 0x00000001085c3a72 $sSo8RLMRealmC13configuration5queueABSo0A13ConfigurationC_So012OS_dispatch_C0CSgtKcfCTO + 146
19 RealmSwift 0x000000010863fc2f $s10RealmSwift0A0V5queueACSo012OS_dispatch_C0CSg_tKcfC + 127
Can you reproduce the bug?
Yes, always
Reproduction Steps
The database file seems corrupted and cannot even be opened with Realm Studio. I cannot publicly share the file due to the user's privacy, but I might be able to send to a dev in private.
Version
10.10.0 (also tried 10.25.1)
What SDK flavour are you using?
Local Database only
Are you using encryption?
Yes, using encryption
Platform OS and version(s)
iOS 15.4.0, 15.4.1, 15.2.0, 15.2.1
Build environment
ProductName: macOS ProductVersion: 12.0.1 BuildVersion: 21A559
/Applications/Xcode.app/Contents/Developer Xcode 13.3.1 Build version 13E500a
/usr/local/bin/pod 1.10.0 Realm (10.10.0) RealmSwift (10.10.0) RealmSwift (= 10.10.0)
/bin/bash GNU bash, version 3.2.57(1)-release (x86_64-apple-darwin21)
(not in use here)
/usr/local/bin/git git version 2.26.0
Hi @BlueCobold Can you send the Realm file to [email protected] so we can investigate? The latest version of Realm (10.25.1) contains a fix that should not let this happen again in the future.
I submitted the file in question.
@jedelbo successfully recovered the Realm file, @BlueCobold I have sent it to you via email.
Super awesome! The customer will be very happy and so am I. I'll upgrade all app versions out there to realm 10.25.1 and hope for the issue to never return. Thanks!
Forensic report: When trying to decrypt the received file, the following showed up:
Checksum failed: 0x90000 0x90000 expected: 0x93 actual: 0x92 Checksum failed: 0x91000 Checksum failed: 0x92000 Checksum failed: 0x93000
Checksum failed: 0xa0000 Checksum failed: 0xa1000 Checksum failed: 0xa2000 Checksum failed: 0xa3000
Checksum failed: 0xa8000
Checksum failed: 0x138000 Checksum failed: 0x139000 0x13900 expected: 0xc0 actual: 0xc3 Checksum failed: 0x13a000 Checksum failed: 0x13b000
Restore old IV: 0x18c000 Restore old IV: 0x18d000 Restore old IV: 0x18e000 Restore old IV: 0x18f000
Restore old IV: 0x198000 Restore old IV: 0x199000 Restore old IV: 0x19a000 Restore old IV: 0x19b000
Restore old IV: 0x1a0000 Restore old IV: 0x1a1000 Restore old IV: 0x1a2000 Restore old IV: 0x1a3000
Restore old IV: 0x1a8000 Restore old IV: 0x1a9000 Restore old IV: 0x1aa000 Restore old IV: 0x1ab000
Restore old IV: 0x1ac000 Restore old IV: 0x1ad000 Restore old IV: 0x1ae000 Restore old IV: 0x1af000
In spite there were checksum errors the content seemed to be consistent except for the 2 cases where a byte value was not as expected. After changing the values back, the file was consistent.
@tgoyne the fact that it is the first byte in a 4k block that is modified, does it make us any wiser? An why does the checksum differ if the content apparently is ok?
Could possibly be an out-of-bounds write somewhere? The first byte in a buffer is the thing that'll be overwritten if some other piece of code has an off-by-one error when writing to something that happens to land immediately before that buffer in memory. The hmac and actual page data are stored in separate blocks of memory so corrupting one but not the other wouldn't be hard to have happen.
If that is actually the problem I'm not sure what action we can really take. Reread all the encryption code and hope to spot something suspicious that could be writing one past the end? I think the use of MAP_ANONYMOUS for the decrypted buffers unfortunately means that asan doesn't work for them, and it might not even be a bug in our code.
The issue has returned. Again, I have a customer with a database that cannot be decrypted. Since this is on Android, I don't have a proper native stack trace and can only assume it is related to the same incorrect checksum in the native code both systems are based on. I can provide the realm-file, so you can check if it's the same problem. The customer's app version is using the latest Android-Realm implementation, which uses the same native code as Realm-Swift 10.25.1, from what I understand. No migration was involved when the realm file got corrupted.
@BlueCobold it would be nice if we would have the possibility to check the realm file to see if the corruption is similar to the first one.
The customer stopped replying and stopped using my app. So I'm afraid, I cannot provide the file.
@jedelbo I submitted another customer's realm file with the same symptoms to [email protected] for analysis.
Using the decrypt-tool in the exec directory, I'm getting the following output: Checksum failed: 0x0 Block never written: 0x55e000 Block never written: 0x55f000
So looks like the first block has issues. The resulting output file is unusable. I have no idea how to get the "actual" and "expected" values that @jedelbo printed in his report, or how to correct possibly faulty bytes to see if the remaining file would be operational. My customer is massively dependent on his data and currently can't access it.
The ticket-bot also seems not to flag this bug-report any longer accordingly. @leemaguire
In the meantime, I checked the decrypted content with a hex editor. Even the damaged first block contains readable strings and thus seems to be decrypted correctly. I imagine there's some header meta-data which is damaged and which makes the RealmBrowser/library believe the file to be still encrypted / unreadable. All other blocks after the first seem to be valid. There are a lot of blocks with readable strings and UUID-tables. From what I assume, the file can be recovered, but I still do not have gathered enough understanding of the internal data structure to make that happen by myself.
I have restored the header with a reference to the top_ref and table_names_ref, but it seems the data is partly scrambled. Some objects have invalid strings which crash Realm when trying to load these objects. Some have fields set to null, which cannot be null (like object-UUIDs for example), but seem to be ok, if I only read this column/field in sequence for the entire table. I wonder, can this potentially be a result of a parallel realm-access which did an automatic compactionOnLaunch?
In further deeper data analysis, I realised some realm-object-keys to be huge. Like '3,402,167,040,181,607,100'. How come they grew so large? Is it possible there's an issue with keys and they spill over at some point or something? Still guessing what could be the reason for badly written pages and wrongly aligned arrays.
@BlueCobold I have been away on holiday, and did not see this until now. I can see that you have sent another file for analysis, but I am not sure which key to use for decrypting.
I thought so. I have replied via email to send you the decryption-key. Did you receive it?
To which email address should the key have been sent to? I have not received anything.
To which email address should the key have been sent to? I have not received anything.
Sorry, I thought there was a forwarded-reply feature on github-mails. Doesn't look like. I had sent the file and key to realm-help with my mail from 18.07., but I can send you another, including some findings so far - including the partly restored file-header.
Great. To be sure that I receive it, you can also send it to [email protected]
I sent it along with a few of my own findings. Thanks for your help.
I tried to decode the file, but the up until 0x2a0, I see only something that looks like random bytes:
00000000 24 bd 76 a9 91 68 46 00 c8 8e 16 5c 07 75 51 00 |$.v..hF....\.uQ.|
00000010 b8 7d a0 47 0e e4 52 00 cc f9 d1 68 d3 f8 53 00 |.}.G..R....h..S.|
00000020 25 fa e5 41 5b 5f 55 00 ba d5 69 72 e5 60 58 00 |%..A[_U...ir.`X.|
00000030 51 86 e8 5a e8 9a 59 00 15 7c 65 32 91 92 5c 00 |Q..Z..Y..|e2..\.|
00000040 b7 72 f6 6d 43 7e 5f 00 af 65 50 ff 80 d3 5f 00 |.r.mC~_..eP..._.|
00000050 94 6f 97 53 a8 e8 5f 00 a9 ed 50 99 26 79 60 00 |.o.S.._...P.&y`.|
00000060 9a 28 ea 36 f5 71 62 00 cf 55 bf 31 07 ca 64 00 |.(.6.qb..U.1..d.|
00000070 d4 04 32 f9 c3 37 65 00 87 ec 01 5a cc fc 65 00 |..2..7e....Z..e.|
00000080 97 65 fa 62 3e af 67 00 bd 4b 71 af fb 24 6c 00 |.e.b>.g..Kq..$l.|
00000090 88 59 45 e9 f8 e5 6d 00 6a af fe 39 9c 2c 70 00 |.YE...m.j..9.,p.|
Does this match your findings?
Yes, exactly my results as well. After that block, it seems to be mostly valid data. That's why I manually restored the header as I wrote in the previous mail. As I said, it contains some corrupted data entries and references, but no more of this byte junk.
The "trash" at the beginning is not actual trash, though. Check the 00 every 8 bytes. I assume it's an array of 64-bit values. Maybe realm-object-keys. The same data can be found at another offset in the file. For example the entry "964C406A 5059B000" from offset 0x1D0 appears again at offset 0x991D0. Which... means they are exactly 0x99000 bytes apart in offset.
The duplicated data starts originally at 0x98EC0, a valid array. And then is "duplicated" into the header, making the file unusable.
Those are great findings. I am a bit embarrassed that I did not spot the zeroes. I hope it can help us further with this issue. It is very common to have duplicated data. Whenever some part of an array is modified, a new version of the array is created by copying the whole array. I will try to see if I can find the "true" top ref.
It is very common to have duplicated data. Whenever some part of an array is modified, a new version of the array is created by copying the whole array.
Yea, I figured that much. It makes sense from a transaction perspective.
I will try to see if I can find the "true" top ref.
That would be great.
Also, if you don't mind, I pointed out the very large object-keys for many objects above. (a few objects have two-digit-keys which seem to be auto-increment style, so the big ones make me wonder what's going on) Is it normal for objects to have such large keys or does that indicate a problematic way of using Realm? Can keys accidentally overflow or does Realm auto-detect free keys during object creation when the max value is reached?
I found the following cluster-tree, related to table realm/realm-swift#10 at offset 0x1192A0: 41414141 4700000C 40870800 00000000 00000000 00000000 D80C0000 00000000 15000000 00000000 A8481700 00000000 03000000 00000000 A1520000 00000000 38650200 00000000 90600200 00000000 6950CC0E 1FBFCF7A 00000000 00000000 01000408 05000000
It contains a lot of very suspicious refs like 03000000, 05000000 or 15000000 These refs would mean they are within the header-bytes for the Realm-file when they get written to! This makes me worry a lot about data consistency.
What you have found here is the table top array. It contains both refs and numbers. If the entry has the LSB set (like 0x15) it is a number. You get the value by shifting down one bit so in this case it is 10, which matches table number 10.
I am somewhat convinced that the first 24 bytes of the file should be
00000000 80 6c 51 00 00 00 00 00 f0 53 51 00 00 00 00 00 |.lQ......SQ.....|
00000010 54 2d 44 42 16 16 00 00 |T-DB....|
making the top ref 0x516c80
I am pretty sure that the problem is that the first 0x1000 bytes have been overwritten with a page that should have been written somewhere else. Unfortunately a lot of refs points into this area, so recreating meaningful data in this area would be some major puzzle.