realm-core icon indicating copy to clipboard operation
realm-core copied to clipboard

Decryption failed - page zero has wrong checksum

Open BlueCobold opened this issue 2 years ago • 41 comments

How frequently does the bug occur?

Seen once

Description

A customer of my app reported suddenly being unable to launch my app. It terminates on first access of the database and it turns out that it is broken for some reason.

It might have broken during a realm migration, but this is uncertain. Newly created files work just fine. I might possibly be allowed to share the db file to a developer for analysis in private, but not in public. I tried to open it with Realm Studio as well and also tried upgrading to Realm 10.25.1, but the file still cannot be decrypted.

Stacktrace & log output


libc++abi: terminating with uncaught exception of type realm::util::DecryptionFailed: Decryption failed
Exception backtrace:
0   Realm          0x000000010b0d349b _ZN5realm4util16DecryptionFailedC2Ev + 107
1   Realm          0x000000010b0b9987 _ZN5realm4util10AESCryptor4readEixPcm + 519
2   Realm          0x000000010b0ba63e _ZN5realm4util20EncryptedFileMapping12refresh_pageEm + 110
3   Realm          0x000000010b0bafee _ZN5realm4util20EncryptedFileMapping12read_barrierEPKvmPFmPKcE + 126
4   Realm          0x000000010ab8e250 _ZN5realm4util26do_encryption_read_barrierEPKvmPFmPKcEPNS0_20EncryptedFileMappingE + 64
5   Realm          0x000000010b0a1822 _ZN5realm11StringIndexC2EmPNS_11ArrayParentEmRKNS_13ClusterColumnERNS_9AllocatorE + 338
6   Realm          0x000000010b08a6b0 _ZN5realm5Table23refresh_index_accessorsEv + 608
7   Realm          0x000000010af533c7 _ZN5realm5Group21create_table_accessorEm + 871
8   Realm          0x000000010af53006 _ZN5realm5Group12do_get_tableEm + 102
9   Realm          0x000000010b1e6287 _ZN5realm12ObjectSchemaC2ERKNS_5GroupENS_10StringDataENS_8TableKeyE + 391
10  Realm          0x000000010b1f0194 _ZN5realm11ObjectStore17schema_from_groupERKNS_5GroupE + 132
11  Realm          0x000000010b2594bb _ZN5realm5Realm32read_schema_from_group_if_neededEv + 187
12  Realm          0x000000010b259268 _ZN5realm5RealmC2ENS0_6ConfigENS_4util8OptionalINS_9VersionIDEEENSt3__110shared_ptrINS_5_impl16RealmCoordinatorEEENS0_13MakeSharedTagE + 456
13  Realm          0x000000010b1b7c2c _ZN5realm5Realm17make_shared_realmENS0_6ConfigENS_4util8OptionalINS_9VersionIDEEENSt3__110shared_ptrINS_5_impl16RealmCoordinatorEEE + 220
14  Realm          0x000000010b1b6294 _ZN5realm5_impl16RealmCoordinator12do_get_realmENS_5Realm6ConfigERNSt3__110shared_ptrIS2_EENS_4util8OptionalINS_9VersionIDEEERNS8_17CheckedUniqueLockE + 532
15  Realm          0x000000010b1b5eaf _ZN5realm5_impl16RealmCoordinator9get_realmENS_5Realm6ConfigENS_4util8OptionalINS_9VersionIDEEE + 495
16  Realm          0x000000010b259ce7 _ZN5realm5Realm16get_shared_realmENS0_6ConfigE + 135
17  Realm          0x000000010ae4d71a +[RLMRealm realmWithConfiguration:queue:error:] + 2314
18  RealmSwift     0x00000001085c3a72 $sSo8RLMRealmC13configuration5queueABSo0A13ConfigurationC_So012OS_dispatch_C0CSgtKcfCTO + 146
19  RealmSwift     0x000000010863fc2f $s10RealmSwift0A0V5queueACSo012OS_dispatch_C0CSg_tKcfC + 127

Can you reproduce the bug?

Yes, always

Reproduction Steps

The database file seems corrupted and cannot even be opened with Realm Studio. I cannot publicly share the file due to the user's privacy, but I might be able to send to a dev in private.

Version

10.10.0 (also tried 10.25.1)

What SDK flavour are you using?

Local Database only

Are you using encryption?

Yes, using encryption

Platform OS and version(s)

iOS 15.4.0, 15.4.1, 15.2.0, 15.2.1

Build environment

ProductName: macOS ProductVersion: 12.0.1 BuildVersion: 21A559

/Applications/Xcode.app/Contents/Developer Xcode 13.3.1 Build version 13E500a

/usr/local/bin/pod 1.10.0 Realm (10.10.0) RealmSwift (10.10.0) RealmSwift (= 10.10.0)

/bin/bash GNU bash, version 3.2.57(1)-release (x86_64-apple-darwin21)

(not in use here)

/usr/local/bin/git git version 2.26.0

BlueCobold avatar Apr 22 '22 04:04 BlueCobold

Hi @BlueCobold Can you send the Realm file to [email protected] so we can investigate? The latest version of Realm (10.25.1) contains a fix that should not let this happen again in the future.

leemaguire avatar Apr 22 '22 11:04 leemaguire

I submitted the file in question.

BlueCobold avatar Apr 22 '22 11:04 BlueCobold

@jedelbo successfully recovered the Realm file, @BlueCobold I have sent it to you via email.

leemaguire avatar Apr 22 '22 15:04 leemaguire

Super awesome! The customer will be very happy and so am I. I'll upgrade all app versions out there to realm 10.25.1 and hope for the issue to never return. Thanks!

BlueCobold avatar Apr 22 '22 16:04 BlueCobold

Forensic report: When trying to decrypt the received file, the following showed up:

Checksum failed: 0x90000 0x90000 expected: 0x93 actual: 0x92 Checksum failed: 0x91000 Checksum failed: 0x92000 Checksum failed: 0x93000

Checksum failed: 0xa0000 Checksum failed: 0xa1000 Checksum failed: 0xa2000 Checksum failed: 0xa3000

Checksum failed: 0xa8000

Checksum failed: 0x138000 Checksum failed: 0x139000 0x13900 expected: 0xc0 actual: 0xc3 Checksum failed: 0x13a000 Checksum failed: 0x13b000

Restore old IV: 0x18c000 Restore old IV: 0x18d000 Restore old IV: 0x18e000 Restore old IV: 0x18f000

Restore old IV: 0x198000 Restore old IV: 0x199000 Restore old IV: 0x19a000 Restore old IV: 0x19b000

Restore old IV: 0x1a0000 Restore old IV: 0x1a1000 Restore old IV: 0x1a2000 Restore old IV: 0x1a3000

Restore old IV: 0x1a8000 Restore old IV: 0x1a9000 Restore old IV: 0x1aa000 Restore old IV: 0x1ab000

Restore old IV: 0x1ac000 Restore old IV: 0x1ad000 Restore old IV: 0x1ae000 Restore old IV: 0x1af000

In spite there were checksum errors the content seemed to be consistent except for the 2 cases where a byte value was not as expected. After changing the values back, the file was consistent.

jedelbo avatar Apr 25 '22 08:04 jedelbo

@tgoyne the fact that it is the first byte in a 4k block that is modified, does it make us any wiser? An why does the checksum differ if the content apparently is ok?

jedelbo avatar Apr 25 '22 08:04 jedelbo

Could possibly be an out-of-bounds write somewhere? The first byte in a buffer is the thing that'll be overwritten if some other piece of code has an off-by-one error when writing to something that happens to land immediately before that buffer in memory. The hmac and actual page data are stored in separate blocks of memory so corrupting one but not the other wouldn't be hard to have happen.

If that is actually the problem I'm not sure what action we can really take. Reread all the encryption code and hope to spot something suspicious that could be writing one past the end? I think the use of MAP_ANONYMOUS for the decrypted buffers unfortunately means that asan doesn't work for them, and it might not even be a bug in our code.

tgoyne avatar Apr 25 '22 16:04 tgoyne

The issue has returned. Again, I have a customer with a database that cannot be decrypted. Since this is on Android, I don't have a proper native stack trace and can only assume it is related to the same incorrect checksum in the native code both systems are based on. I can provide the realm-file, so you can check if it's the same problem. The customer's app version is using the latest Android-Realm implementation, which uses the same native code as Realm-Swift 10.25.1, from what I understand. No migration was involved when the realm file got corrupted.

BlueCobold avatar May 13 '22 05:05 BlueCobold

@BlueCobold it would be nice if we would have the possibility to check the realm file to see if the corruption is similar to the first one.

jedelbo avatar May 19 '22 12:05 jedelbo

The customer stopped replying and stopped using my app. So I'm afraid, I cannot provide the file.

BlueCobold avatar Jul 05 '22 14:07 BlueCobold

@jedelbo I submitted another customer's realm file with the same symptoms to [email protected] for analysis.

BlueCobold avatar Jul 18 '22 11:07 BlueCobold

Using the decrypt-tool in the exec directory, I'm getting the following output: Checksum failed: 0x0 Block never written: 0x55e000 Block never written: 0x55f000

So looks like the first block has issues. The resulting output file is unusable. I have no idea how to get the "actual" and "expected" values that @jedelbo printed in his report, or how to correct possibly faulty bytes to see if the remaining file would be operational. My customer is massively dependent on his data and currently can't access it.

The ticket-bot also seems not to flag this bug-report any longer accordingly. @leemaguire

BlueCobold avatar Jul 23 '22 06:07 BlueCobold

In the meantime, I checked the decrypted content with a hex editor. Even the damaged first block contains readable strings and thus seems to be decrypted correctly. I imagine there's some header meta-data which is damaged and which makes the RealmBrowser/library believe the file to be still encrypted / unreadable. All other blocks after the first seem to be valid. There are a lot of blocks with readable strings and UUID-tables. From what I assume, the file can be recovered, but I still do not have gathered enough understanding of the internal data structure to make that happen by myself.

BlueCobold avatar Jul 24 '22 07:07 BlueCobold

I have restored the header with a reference to the top_ref and table_names_ref, but it seems the data is partly scrambled. Some objects have invalid strings which crash Realm when trying to load these objects. Some have fields set to null, which cannot be null (like object-UUIDs for example), but seem to be ok, if I only read this column/field in sequence for the entire table. I wonder, can this potentially be a result of a parallel realm-access which did an automatic compactionOnLaunch?

BlueCobold avatar Aug 08 '22 14:08 BlueCobold

In further deeper data analysis, I realised some realm-object-keys to be huge. Like '3,402,167,040,181,607,100'. How come they grew so large? Is it possible there's an issue with keys and they spill over at some point or something? Still guessing what could be the reason for badly written pages and wrongly aligned arrays.

BlueCobold avatar Aug 09 '22 07:08 BlueCobold

@BlueCobold I have been away on holiday, and did not see this until now. I can see that you have sent another file for analysis, but I am not sure which key to use for decrypting.

jedelbo avatar Aug 09 '22 09:08 jedelbo

I thought so. I have replied via email to send you the decryption-key. Did you receive it?

BlueCobold avatar Aug 09 '22 09:08 BlueCobold

To which email address should the key have been sent to? I have not received anything.

jedelbo avatar Aug 09 '22 10:08 jedelbo

To which email address should the key have been sent to? I have not received anything.

Sorry, I thought there was a forwarded-reply feature on github-mails. Doesn't look like. I had sent the file and key to realm-help with my mail from 18.07., but I can send you another, including some findings so far - including the partly restored file-header.

BlueCobold avatar Aug 09 '22 11:08 BlueCobold

Great. To be sure that I receive it, you can also send it to [email protected]

jedelbo avatar Aug 09 '22 12:08 jedelbo

I sent it along with a few of my own findings. Thanks for your help.

BlueCobold avatar Aug 09 '22 13:08 BlueCobold

I tried to decode the file, but the up until 0x2a0, I see only something that looks like random bytes:

00000000  24 bd 76 a9 91 68 46 00  c8 8e 16 5c 07 75 51 00  |$.v..hF....\.uQ.|
00000010  b8 7d a0 47 0e e4 52 00  cc f9 d1 68 d3 f8 53 00  |.}.G..R....h..S.|
00000020  25 fa e5 41 5b 5f 55 00  ba d5 69 72 e5 60 58 00  |%..A[_U...ir.`X.|
00000030  51 86 e8 5a e8 9a 59 00  15 7c 65 32 91 92 5c 00  |Q..Z..Y..|e2..\.|
00000040  b7 72 f6 6d 43 7e 5f 00  af 65 50 ff 80 d3 5f 00  |.r.mC~_..eP..._.|
00000050  94 6f 97 53 a8 e8 5f 00  a9 ed 50 99 26 79 60 00  |.o.S.._...P.&y`.|
00000060  9a 28 ea 36 f5 71 62 00  cf 55 bf 31 07 ca 64 00  |.(.6.qb..U.1..d.|
00000070  d4 04 32 f9 c3 37 65 00  87 ec 01 5a cc fc 65 00  |..2..7e....Z..e.|
00000080  97 65 fa 62 3e af 67 00  bd 4b 71 af fb 24 6c 00  |.e.b>.g..Kq..$l.|
00000090  88 59 45 e9 f8 e5 6d 00  6a af fe 39 9c 2c 70 00  |.YE...m.j..9.,p.|

Does this match your findings?

jedelbo avatar Aug 09 '22 13:08 jedelbo

  Yes, exactly my results as well. After that block, it seems to be mostly valid data. That's why I manually restored the header as I wrote in the previous mail. As I said, it contains some corrupted data entries and references, but no more of this byte junk.

The "trash" at the beginning is not actual trash, though. Check the 00 every 8 bytes. I assume it's an array of 64-bit values. Maybe realm-object-keys. The same data can be found at another offset in the file. For example the entry "964C406A 5059B000" from offset 0x1D0 appears again at offset 0x991D0. Which... means they are exactly 0x99000 bytes apart in offset.

BlueCobold avatar Aug 09 '22 14:08 BlueCobold

The duplicated data starts originally at 0x98EC0, a valid array. And then is "duplicated" into the header, making the file unusable.

BlueCobold avatar Aug 09 '22 14:08 BlueCobold

Those are great findings. I am a bit embarrassed that I did not spot the zeroes. I hope it can help us further with this issue. It is very common to have duplicated data. Whenever some part of an array is modified, a new version of the array is created by copying the whole array. I will try to see if I can find the "true" top ref.

jedelbo avatar Aug 10 '22 14:08 jedelbo

It is very common to have duplicated data. Whenever some part of an array is modified, a new version of the array is created by copying the whole array.

Yea, I figured that much. It makes sense from a transaction perspective.

I will try to see if I can find the "true" top ref.

That would be great.

Also, if you don't mind, I pointed out the very large object-keys for many objects above. (a few objects have two-digit-keys which seem to be auto-increment style, so the big ones make me wonder what's going on) Is it normal for objects to have such large keys or does that indicate a problematic way of using Realm? Can keys accidentally overflow or does Realm auto-detect free keys during object creation when the max value is reached?

BlueCobold avatar Aug 10 '22 14:08 BlueCobold

I found the following cluster-tree, related to table realm/realm-swift#10 at offset 0x1192A0: 41414141 4700000C 40870800 00000000 00000000 00000000 D80C0000 00000000 15000000 00000000 A8481700 00000000 03000000 00000000 A1520000 00000000 38650200 00000000 90600200 00000000 6950CC0E 1FBFCF7A 00000000 00000000 01000408 05000000

It contains a lot of very suspicious refs like 03000000, 05000000 or 15000000 These refs would mean they are within the header-bytes for the Realm-file when they get written to! This makes me worry a lot about data consistency.

BlueCobold avatar Aug 10 '22 15:08 BlueCobold

What you have found here is the table top array. It contains both refs and numbers. If the entry has the LSB set (like 0x15) it is a number. You get the value by shifting down one bit so in this case it is 10, which matches table number 10.

jedelbo avatar Aug 10 '22 15:08 jedelbo

I am somewhat convinced that the first 24 bytes of the file should be

00000000  80 6c 51 00 00 00 00 00  f0 53 51 00 00 00 00 00  |.lQ......SQ.....|
00000010  54 2d 44 42 16 16 00 00                           |T-DB....|

making the top ref 0x516c80

jedelbo avatar Aug 10 '22 15:08 jedelbo

I am pretty sure that the problem is that the first 0x1000 bytes have been overwritten with a page that should have been written somewhere else. Unfortunately a lot of refs points into this area, so recreating meaningful data in this area would be some major puzzle.

jedelbo avatar Aug 10 '22 16:08 jedelbo