application-services Delete corrupted places databases, to allow the app to recover

(Splitting out a places-specific version of https://github.com/mozilla/application-services/issues/2628, which I've made specific to logins).

If something causes a user's places database to become corrupt, they are going to have a very bad time until the app is uninstalled. We really should do something.

Simplist solutions would be to delete the corrupt database and starting again. More involved would be to have a backup strategy similar to desktop.

Per https://github.com/mozilla/application-services/issues/2628#issuecomment-589305179, product decision has been made that the risk of data loss is offset by the potential for leaving users struggling long with a broken user experience.

I think this could be a good-first- or good-second-issue if we provide a bit more detail. Specifically things like:

Do we want to restrict this to errors when opening the file, or to more general schema-related errors that might be encountered at runtime?
Do we want to keep a backup of the database file just in case, for future recovery/debugging?

┆Issue is synchronized with this Jira Task ┆epic: Triage ┆sprintEndDate: 2021-09-17

Mar 18 '20 05:03 rfk

We are seeing these in the wild - eg, https://sentry.prod.mozaws.net/operations/fenix/issues/6957530/, but I'm confident I've seen others before.

May 04 '20 08:05 mhammond

"Database or disk is full" could mean other things, no? I'm not sure "full disk? delete the database" is a good policy on resource constrained systems like mobile.

Note that there's an open rusqlite issue to give better error information at least on open failures (SQLITE_CANTOPEN an SQLITE_IOERR): https://github.com/rusqlite/rusqlite/issues/458.

May 04 '20 17:05 thomcc

"Database or disk is full"

Oops, yes, I must have copied the wrong issue URL :) But there are certainly sentry issues for "image is malformed"

May 04 '20 22:05 mhammond

➤ Janet Dragojevic commented:

Ben Dean-Kawamura is this still a valid bug or did your recent work resolve this?

Jun 24 '21 00:06 data-sync-user

➤ Ben Dean-Kawamura commented:

This isn’t implemented yet. I think the plan should be:

Implement https://mozilla-hub.atlassian.net/browse/SYNC-2260 ( https://mozilla-hub.atlassian.net/browse/SYNC-2260|smart-link ) to handle this in general using the open_database module.
Update the places component to use the open_database code

Jun 24 '21 15:06 data-sync-user

We're seeing a ton of this in Sentry - https://sentry.prod.mozaws.net/operations/firefox/issues/9616768/events/0420a9b6ab3b4a6b8ef0615245d5e73e/ - would be great to have a recovery path.

Also, I'm assuming it's been verified that we aren't doing anything that could be causing db corruption..?

Sep 02 '21 23:09 grigoryk

Yeah - #4214 isn't trivial, and updating places to open_database.rs isn't either - but they can be done mostly in parallel. I agree we should get to this sooner rather than later.

Sep 03 '21 09:09 mhammond

➤ Mark Hammond commented:

We’ve landed code that should databases we detect as corrupt at open time. We still need a plan for handling corruption as it is being used.

Sep 16 '21 22:09 data-sync-user

Looks like this is still common in release, ref: https://sentry.prod.mozaws.net/operations/firefox/issues/9612438/?query=is%3Aunresolved%20Database

Should #4639 be prioritized? The count is high

Feb 14 '22 20:02 tarikeshaq

We now delete files found to be corrupt at open time and we have #4639 to track deleting them found to be corrupt while operating on them so there's no need for this any more.

Sep 06 '23 18:09 mhammond