absurd-sql Database disk image is malformed error sometimes

Database disk image is malformed error sometimes

Open quolpr opened this issue 2 years ago • 15 comments

I use Microsoft edge(chrome-based browser), and it supports the Atomics & SharedBuffer, and sometimes I get database disk image is malformed error. However, I didn't check if it happens when multiple tabs are opened, and I am unsure under which conditions it is happening.

Here is screenshot from sentry:

The stack trace:

Error: database disk image is malformed
  at n.handleError(/assets/VaultDb.worker.d94830ba.js:15:55024)
  at e.step(/assets/VaultDb.worker.d94830ba.js:15:50388)
  at n.exec(/assets/VaultDb.worker.d94830ba.js:15:53846)
  at Ri.sqlExec(/assets/VaultDb.worker.d94830ba.js:15:148562)
  at Ri.getRecords(/assets/VaultDb.worker.d94830ba.js:15:148929)
  at Qo.getLinksOfNoteId(/assets/VaultDb.worker.d94830ba.js:15:127812)
  at MessagePort.n(/assets/VaultDb.worker.d94830ba.js:1:1863)

Sep 21 '21 16:09 quolpr

The curious thing — after page refresh, the DB is ok and I can access to read and write 🤔

Sep 21 '21 16:09 quolpr

I also test on Edge and I've noticed that the database doesn't persist on reload and keeps throwing the same error after the initial write. After refreshing the error goes away but the initial query to get all the rows in the table returns an empty array. I'll add a screenshot of the error soon but I don't think it's an image malformed error.

I'll try to write some example code to hopefully make it easy to reproduce.

Sep 21 '21 16:09 rshigg

@rshigg I think your issue is not connected with this. I had the same error, and here is the solution https://github.com/jlongster/absurd-sql/issues/14#issuecomment-902116794 . You need to set up the correct CORS

Sep 21 '21 19:09 quolpr

@quolpr thanks for the heads up, that was exactly it! If I come across this issue I'll update here :)

Sep 21 '21 22:09 rshigg

I think the issue is connected with this https://github.com/jlongster/absurd-sql/issues/24 . Cause after an increase of page size the frequency of such error got lower

Oct 12 '21 22:10 quolpr

I also noticed that such error happens when I have intensive writing in one tab, and intensive reading in a separate tab (everything happens at the same time) 🤔

Oct 12 '21 22:10 quolpr

It's hard to know exactly. See my comment on #24, I don't think that's a problem.

What we really need to debug these kinds of issues is detailed logging. This error will happen whenever there the bytes read back to sqlite are in an unexpected format. Which means the reads are wrong. It's a critical issue and should never happen (while the "database is locked" may happen currently)

We could also leverage property-based tests (we already have some in here) to try to simulate these situations. We need a way to record the path to this error to so we can easily debug it.

It would help to at least know which read is the one that makes this happen. Is it when it tried to upgrade a lock? Opening a db? Or just reading data for a query? etc

Oct 13 '21 16:10 jlongster

@quolpr Getting the same error intermittently - I'm 85% sure it's when we're doing large reads. Wondering if you found a solution for this. Refreshes also fix the error for us!

@jlongster First of all thanks for this library - it's been a huge lifesaver :) The queries that we're running are pretty simple selects (screenshot below). We have a debug mode that logs the query that's running and results returning. Do you have any advice on what else we could log (and where) to track this issue down?

Feb 25 '22 18:02 cathykc

@cathykc How many tabs did you have? I encountered this error only with two or more tabs.

Feb 25 '22 19:02 steida

We had two tabs open!

Feb 25 '22 23:02 cathykc

Ok, I found, at least when it happens — when IndexedDB is resizing. So, if you put 100,000 records in db, and then delete them — this error will never happen till more than 100,000 records will be created again (so absurd-sql will need to create new block in IndexedDB)

Jun 23 '22 06:06 quolpr

I was describing alternatives to the Web Locks API to someone, and when I got to the absurd-sql approach I happened on something suspicious that seems worth investigating as a cause for this bug. It's in Transaction.upgradeExclusive() which upgrades the lock from SHARED to RESERVED/EXCLUSIVE:

https://github.com/jlongster/absurd-sql/blob/ade37d94c78d0156da73e7bd540696b39269d9e6/src/indexeddb/worker.js#L95-L111

This method abandons its read lock by discarding the existing "readonly" IDBTransaction and attempts the upgrade by creating a new "readwrite" IDBTransaction. It then checks that no other connection modified the database in between those operations by verifying that certain bytes in the first block are unchanged.

Let's assume that some other connection did in fact modify the database and that first block is different. What happens?

What happens is the VFS is going to return a status code that it failed to obtain the lock. But what state is it leaving behind? The Transaction instance is going to have LOCK_TYPE.EXCLUSIVE set, it will contain a "readwrite" IDBTransaction, and it has cached the new first block data. Meanwhile, SQLite thinks the lock state is still SHARED (because the upgrade failed) and it still has the original first block data in its cache. These discrepancies make me very nervous.

For one thing, the SQLite locking model is now broken. As far as SQLite knows, it is still in SHARED mode and can continue to read from the database. But the IDBTransaction it was using to read (and counting on for consistency) is now gone. If SQLite does happen to make calls that go to a new IDBTransaction, eventually it will get very confused.

For another thing, if that Transaction instance state isn't reset and remains active, then the next attempt to upgrade the lock will succeed because it thinks it is already in the upgraded state. I don't know if this retry actually happens, but if it does then that opens the door to corruption.

Now whether or not this is in fact the cause of this particular bug, I don't think this upgrade approach of dropping read-only, acquiring read-write, and checking for changes can be made to work with the SQLite locking model. This is because (1) I don't think there is a way to preserve the SHARED lock and (2) there is no protocol to tell SQLite that not only did it not upgrade the lock, it also lost its original lock.

Anyway, I'm just posting this in case it's helpful to someone. I don't plan on pursuing it further myself (not currently an absurd-sql user). If upgrade-or-sustain semantics (as opposed to upgrade-or-unlock) can't be implemented immediately (which I believe will require another Worker and SharedArrayBuffer), patching the code to allow only non-shared access with a "readwrite" IDBTransaction should be relatively easy.

Jul 15 '22 23:07 rhashimoto

@rhashimoto good discover 👍

Jul 16 '22 18:07 quolpr

Getting this too. Used to happen occasionally but would be fixed by a refresh. Lately though, it seems to permanently corrupt the database in question (can't open it ever again without getting the error - have to delete the IDB database and make a new one). Only happens with multiple tabs.

Jul 30 '22 11:07 sampbarrow

We just ran into this with our pre-release product. While one tab is doing a lot of writes, the other reads for a while just fine, then something changes, and all reads after that point throw this error until the page is refreshed.

It would help to at least know which read is the one that makes this happen. Is it when it tried to upgrade a lock? Opening a db? Or just reading data for a query? etc

In my case, I think it's just reading data for a query. It's possible that a write operation happened, though, that triggered a bad state, and that the read queries are throwing this error after some silent corrupting action.

Aug 25 '22 16:08 mrmurphy

absurd-sql absurd-sql copied to clipboard

Database disk image is malformed error sometimes

absurd-sql
absurd-sql copied to clipboard