Database Disk Image Malformed (IDBatchAtomicVFS, FTS5)
After compiling a separate build that included the FTS5 extension, I enabled database triggers for my primary table, I.e.
CREATE TRIGGER IF NOT EXISTS thing AFTER INSERT ON table
WHEN <condition>
BEGIN
INSERT INTO fts5_table(rowid, value)
VALUES (NEW.rowid, NEW.value);
END;
... same for update and delete
I have no pragma settings on my table, but the table now frequently gets corrupted with a Database Disk Image Corrupted error. Is there a way to go around debugging this issue? It seems to happen randomly, and doesn't happen under any particular load, but during development I frequently refresh the page so it seems possible that the database is between an insert and the trigger, and therefore has some corrupted data. Otherwise, perhaps FTS5 is making some background changes.
I'm happy to try and come up with a demo project to demonstrate the issue, but because it is random it seems hard to accurately reproduce.
Prior to FTS5 and compiling my own emscripten build, this never once happened.
I found a problem when implementing synchronous IDBMirrorVFS. When I used 64k page_size for stress testing (page_size is probably irrelevant to the problem), the database would be corrupted when the database size exceeded 1G. After debugging, xWrite did not receive a write request for block 16384 (1G), which caused indexed db to not write this block, which corrupted the database.
Might IDBatchAtomicVFS have similar problems? My solution is to fill in the unwritten blocks before writing.
https://github.com/rhashimoto/wa-sqlite/blob/7a14d7873b561ee1b0bfd4230f72e5eb83576b59/src/examples/IDBMirrorVFS.js#L336
https://github.com/Spxg/sqlite-wasm-rs/blob/0d4b227b63d4aa857d8f73a7b8fdda265eec8080/sqlite-wasm-rs/src/vfs/relaxed_idb.rs#L77
After debugging, xWrite did not receive a write request for block 16384 (1G), which caused indexed db to not write this block, which corrupted the database.
Might IDBatchAtomicVFS have similar problems?
Thanks, @Spxg, for reporting your experience and proposing a solution! I did not consider that SQLite might write beyond the file size offset, so your conjecture of the same problem in IDBBatchAtomicVFS is quite plausible.
I have tried 1GB+ IDBBatchAtomicVFS databases in the past without trouble. Are you also using the FTS5 extension? And for @abhay-agarwal, what size database do you have?
Confirming now that after downgrading to the default artifacts distributed by wa-sqlite (i.e. without FTS5), this problem went away.
Possible issues could be:
- My emscripten build environment is not correct. I am currently testing the scenario where I build without the FTS5 extension to see if that reproduces the problem, which would indicate that there are incorrect emscripten configurations that cause this.
- FTS5 impacted by the database size problem (plausible if FTS5 creates massive tables/indices). However, my databases are nowhere near 1GB. There's possibly 10K rows of roughly 1-10kb per row (some uuids and a blob). It could certainly get up to 1GB in a large deployment though so I will keep tabs on the issue mentioned.
Details of emscripten build configuration:
Emcc -v
emcc (Emscripten gcc/clang-like replacement + linker emulating GNU ld) 4.0.5 (53b38d0c6f9fce1b62c55a8012bc6477f7a42711)
Yarn -v
4.0.2
Are you also using the FTS5 extension?
I don't use FTS5, can reproduce it using the demo web. Could it be that the reason here is that under the influence of FTS5, it is easier to produce blocks exceeding file_size?
Could it be that the reason here is that under the influence of FTS5, it is easier to produce blocks exceeding file_size?
That would be my guess, too.
I'll try to get to this in the coming week.
Confirming that my custom setup with FTS5 enabled, but unused, has thus far not encountered the malformed error. Here's my FTS5 table creation command in case useful:
CREATE VIRTUAL TABLE IF NOT EXISTS search USING fts5(
value,
content='MainTable',
tokenize='porter unicode61 remove_diacritics 1'
);
I'm using FTS5 as a contentless table.
Using the highly appreciated test case in https://github.com/rhashimoto/wa-sqlite/pull/259, both IDBBatchAtomicVFS and IDBMirrorVFS fail on the trunk in slightly different ways.
IDBBatchAtomicVFS:
demo-worker.js?build=asyncify&config=IDBBatchAtomicVFS:158 SQLiteError: SQL logic error
at check (sqlite-api.js:861:11)
at sqlite-api.js:708:13
at async demo-worker.js?build=asyncify&config=IDBBatchAtomicVFS:139:24
(anonymous) @ demo-worker.js?build=asyncify&config=IDBBatchAtomicVFS:158
IDBMirrorVFS:
demo-worker.js?build=asyncify&config=IDBMirrorVFS:158 SQLiteError: database disk image is malformed
at check (sqlite-api.js:861:11)
at Object.step (sqlite-api.js:734:14)
at async demo-worker.js?build=asyncify&config=IDBMirrorVFS:141:16
(anonymous) @ demo-worker.js?build=asyncify&config=IDBMirrorVFS:158
I confirm no IDBMirrorVFS failure with the fix in #259.
IDBBatchAtomicVFS and IDBMirrorVFS fail on the trunk in slightly different ways.
IDBBatchAtomicVFS fails on the test case because it does not allow PRAGMA page_size=65536 once the database file has been created. So that was actually a user error where a console message would have been helpful (I'll add one). If I skip that statement on the second run then it runs through without an error.
So this test case doesn't work for IDBBatchAtomicVFS because unlike for IDBMirrorVFS, this test case does not cause SQLite to write beyond the file size point. I'm not sure why - maybe it has something to do with batch atomic mode. Writing beyond the file size point might still be the bug in IDBBatchAtomicVFS - I think it remains a strong suspect - it's just that this test case as it is doesn't expose it.
The next thing I'll try is to disable batch atomic mode and see what happens.
I think I understand the test case behavior now.
The test case does cause IDBBatchAtomicVFS to write past EOF. The reason I didn't think so is because my detection (and handling) of that condition was faulty - I was checking against the metadata file size in a lambda that was executed after the metadata had been updated.
IDBMirrorVFS failed quickly after those post-EOF writes because the reported file size didn't match the file size field in the SQLite file header. That is because IDBMirrorVFS calculates the database file size by multiplying the number of pages by the page size. When a page at EOF was not written, that calculation was off. The @Spxg PR (now merged) fixed that problem.
IDBBatchAtomicVFS didn't fail on the test case because it tracks the file size by watching the highest byte written, so there was no file size discrepancy with the header.
IDBBatchAtomicVFS would fail if it ever attempted to read a page that had never been written. The test case does not do that. I also tried PRAGMA integrity_check and apparently that doesn't do that. Maybe FTS5 does for some reason?
I have a branch that fills in skipped pages for IDBBatchAtomicVFS. @abhay-agarwal, can you see if that affects the behavior you're seeing? The only changes are in IDBBatchAtomicVFS.js, so you can just grab that file if that's any easier to patch in.
Update: I'm less optimistic now that writes beyond EOF is the cause of the originally reported problem. SQLite reserves the page at 1GB for locking bytes so it should never read or write that specific page. It's good to have an explanation for that, but it also doesn't suggest how it might occur with smaller databases. It doesn't rule out that behavior by FTS5 but it feels less likely.
The given branch does indeed fix the problem. However, I am also not convinced that the issue was the proposed cause. That said, the issue is no longer happening. Our code base has been evolving considerably so I don't have the bandwidth to do a git bisection to see whether this was indeed the cause.
Nevertheless, I appreciate the hard work, and you can feel free to mark this as closed @rhashimoto , or rename the issue if you want to proceed with your other investigation.