replicache icon indicating copy to clipboard operation
replicache copied to clipboard

"The database connection is closing." error again (10.x specific?)

Open aboodman opened this issue 2 years ago • 20 comments

Le sigh. See: https://discord.com/channels/830183651022471199/966042424273150033

Tasks:

  1. [ ] When this error happens (and the Replicache instance hasn't been closed) try to reopen the connection to the idb database. If we find the database is still there resume normal operation. If we find the database has been deleted throw a specific Error indicating this. This change will: a) automatically recover when possible and b) enable discerning between connections randomly closing (which Google search results suggests does happen) and connections closing due to idb database being deleted.
  2. [ ] If (1) proves useful, add an explicit error callback to Replicache for notifying the developer when we detect the idb database has been deleted (similar to https://doc.replicache.dev/api/classes/Replicache#onclientstatenotfound).
  3. [ ] Ensure that all api entry points query/mutator/etc check if Replicache is already closed and throw a specific Error in this case.
  4. [ ] Ensure that all background processes check if Replicache is already closed before trying to access idb (we generally do this but our probably missing a case or two).

aboodman avatar Apr 20 '22 01:04 aboodman

The overall dataset is small, so this might change, but so far this error has only occurred on Chrome 100.

tmcw avatar Apr 22 '22 12:04 tmcw

I'm intermittently see this in tests too:

▶ npm run test

> [email protected] test
> web-test-runner


src/replicache-mutation-recovery.test.ts:

 ❌ successfully recovering mutations of client with same schema version and replicache format version (failed on Webkit)
      InvalidStateError: Failed to execute 'transaction' on 'IDBDatabase': The database connection is closing.
        at readImpl (src/kv/idb-store.ts:158:28)
        at src/kv/idb-store.ts:35:26

Firefox:  |██████████████████████████████| 50/50 test files | 338 passed, 0 failed, 8 skipped
Chromium: |██████████████████████████████| 51/51 test files | 338 passed, 0 failed, 9 skipped
Webkit:   |██████████████████████████████| 50/50 test files | 337 passed, 1 failed, 8 skipped

Finished running tests in 17.2s with 1 failed tests.

@grgbkr

One way to make these go away is to prevent concurrency in the test runner -- but we do want to support concurrency so we need to better deal with exceptions during closing of Replicache. One possibility would be to pass in the AbortSignal through many more layers...

arv avatar Apr 24 '22 09:04 arv

I'm getting this error frequently in local dev. It might be related to hot-reloading. I still haven't deployed / tested a prod version, but it's a bit annoying in dev.

My Chrome version is 102.0.5005.115 (Official Build) (arm64).

koahmad avatar Jul 04 '22 21:07 koahmad

Plan to investigate this more today.

@koahmad Can you share the details of your local dev setup?

grgbkr avatar Jul 07 '22 16:07 grgbkr

Yes -- I'm using NextJS (10.2.3) and use next dev to start the app. What else would be helpful to you?

koahmad avatar Jul 07 '22 17:07 koahmad

@koahmad thanks, that is enough to help me get started on reproing.

grgbkr avatar Jul 07 '22 18:07 grgbkr

More information on this. Another of our customers reports they see this error in their error reporting from prod.

The browser breakdown is image image

Errors from one session: image

grgbkr avatar Jul 14 '22 18:07 grgbkr

This can happen in production when

  1. the idb is deleted a. via code b. via devtools c. or by the browser due to hitting size quota
  2. close is called on the replicache instance, and then apis like replicache.query, or replicache.mutate.* are used.
  3. the browser closes the idb connection for some other reason (it shouldn't but google results suggests it randomly does).

For 1 we can't recover, but we could detect and notify the developer via an error callback (similar to https://doc.replicache.dev/api/classes/Replicache#onclientstatenotfound).

For 2 we should check if already closed in query/mutate/etc, and throw a specific error.

For 3 we should try to recover by re-openning the idb conneciton.

grgbkr avatar Jul 14 '22 18:07 grgbkr

Perhaps we could collapse (1) and (3) by trying to reopen and if we find that the db is gone (😬) notifying user.

aboodman avatar Jul 14 '22 20:07 aboodman

Wanted to share an update on our current understanding of the impact of this bug when it occurs in production. If the client view is < 100MBs in size the impact will be largely invisible to end users.

When the client view is < 100MB the impact for the affected tab is:

  1. pending mutations are not persisted locally, and so if the tab is closed before pending mutations are pushed they will be lost (as mutation recovery can not recover non-persisted mutations)
  2. newly synced snapshots are not persisted, resulting in new tabs having to sync a bit more on start up

Other than the above the client will function normally. Mutators, querys, subscribe and watches all continue to work. Sync will continue to work. This is because all of these work against a memory cache that is lazyily persisted to IndexedDB. If the connection to IndexedDB is lost they all continue to work correctly against the memory cache and things just fail to persist.

If the client view is > 100MBs, there is more likely to be end user visible impact. Currently Replicache's memory cache has a size limit of 100MB (using a LRU eviction strategy). If the IndexedDB connection is lost and there is a memory cache miss during execution of a Replicache operation (mutator, query, subscribe, watch, push or pull), then that operation will fail.

grgbkr avatar Aug 09 '22 17:08 grgbkr

Ok I think we now have a solid plan for addressing this:

  1. When this error happens (and the Replicache instance hasn't been closed) try to reopen the connection to the idb database. If we find the database is still there resume normal operation. If we find the database has been deleted throw a specific Error indicating this. This change will: a) automatically recover when possible and b) enable discerning between connections randomly closing (which Google search results suggests does happen) and connections closing due to idb database being deleted.
  2. If (1) proves useful, add an explicit error callback to Replicache for notifying the developer when we detect the idb database has been deleted (similar to https://doc.replicache.dev/api/classes/Replicache#onclientstatenotfound).
  3. Ensure that all api entry points query/mutator/etc check if Replicache is already closed and throw a specific Error in this case.
  4. Ensure that all background processes check if Replicache is already closed before trying to access idb (we generally do this but are probably missing a case or two).

We will do (1) this week, and (2) shortly after.

We should hold off on (3) and (4) until after https://github.com/rocicorp/replicache/issues/1009. The full offline support change involves a lot of significant refactors, and I worry if we do this before offline there is a non-trivial chance of those refactors regressing it. Also (3) and (4) won't improve anything for end users, they just provide nicer error reporting for developers.

grgbkr avatar Aug 09 '22 18:08 grgbkr

I think eventually there should also be a (5) which is a bigger project:

  • some sort of first-class feature around managing quota and eviction.

aboodman avatar Aug 09 '22 18:08 aboodman

I think eventually there should also be a (5) which is a bigger project:

  • some sort of first-class feature around managing quota and eviction.

Agree, though I'd track that feature as a separate issue.

grgbkr avatar Aug 09 '22 19:08 grgbkr

Adding some relevant bug reports about unexpected idb connection closes from other projects: https://bugs.webkit.org/show_bug.cgi?id=197050 https://bugs.chromium.org/p/chromium/issues/detail?id=1085724 https://github.com/jensarps/IDBWrapper/issues/80 https://github.com/firebase/firebase-js-sdk/issues/1533

grgbkr avatar Aug 09 '22 19:08 grgbkr

Update: There's a new beta release of Replicache: 11.2.1 that contains code to help us understand what's happening here. If you are seeing this error, please install:

npm add [email protected]

Then deploy to production. You should see one of two things happen:

  1. The error goes away. In that case, the rumors are true: Chrome will sometimes mysteriously close connections to IDB and just reopening it fixes it 🤷‍♂️.
  2. The error changes to Replicache IndexedDB not found: <name>. In that case, the databases are getting deleted -- perhaps due to quota or some other reason.

Please let us know what happens. Thanks!

aboodman avatar Aug 19 '22 21:08 aboodman

So, it went away, it seemed, but it's back - still seeing these issues cropping up in development.

CleanShot 2022-10-07 at 10 23 48@2x

tmcw avatar Oct 07 '22 14:10 tmcw

What version of Replicache?

aboodman avatar Oct 07 '22 19:10 aboodman

At currently:

=> Found "[email protected]"

I'm still pretty interested in a version of Replicache that doesn't use IDB. I really think the costs outweigh the benefits of the local-storage approach for my usecase, and the browser implementations of IDB are apparently really flawed.

tmcw avatar Oct 10 '22 18:10 tmcw

@grgbkr said he had some ideas why there is still a possible case of this.

arv avatar Oct 20 '22 20:10 arv

Some good new on this. We found a way to easily reproduce this by using replicache in a codesandbox. The super aggressive HMR makes it easy to reproduce. Now that we can repro, we can more reliably debug and fix.

grgbkr avatar Nov 14 '22 17:11 grgbkr