application-services icon indicating copy to clipboard operation
application-services copied to clipboard

Error during PlacesHistoryStorage$deleteEverything: Query returned no rows

Open grigoryk opened this issue 3 years ago • 10 comments

Low-ish volume on Nightly. https://sentry.prod.mozaws.net/operations/firefox-nightly/issues/17840159/?query=is%3Aunresolved%20level%3Afatal%20InternalPanic

mozilla.appservices.places.uniffi.InternalPanic: Unexpected error: Error executing SQL: Query returned no rows
    at mozilla.appservices.places.uniffi.PlacesException$ErrorHandler.read(places.kt:3)
    at mozilla.appservices.places.uniffi.PlacesException$ErrorHandler$lift$1.invoke(places.kt:2)
    at mozilla.appservices.places.uniffi.PlacesException$ErrorHandler$lift$1.invoke(places.kt:1)
    at mozilla.appservices.places.uniffi.PlacesKt.liftFromRustBuffer(places.kt:2)
    at mozilla.appservices.places.uniffi.PlacesException$ErrorHandler.lift(places.kt:2)
    at mozilla.appservices.places.uniffi.PlacesException$ErrorHandler.lift(places.kt:1)
    at mozilla.appservices.places.uniffi.PlacesConnection.deleteEverythingHistory(places.kt:16)
    at mozilla.appservices.places.PlacesWriterConnection.deleteEverything(PlacesConnection.kt:3)
    at mozilla.components.browser.storage.sync.PlacesHistoryStorage$deleteEverything$2.invokeSuspend(PlacesHistoryStorage.kt:2)
    at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:3)
    at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:18)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:641)
    at java.lang.Thread.run(Thread.java:764)

┆Issue is synchronized with this Jira Task ┆sprintEndDate: 2022-03-04

grigoryk avatar Feb 22 '22 20:02 grigoryk

Currently top 14 crash for Nightly.

https://crash-stats.mozilla.org/signature/?product=Fenix&signature=mozilla.appservices.places.uniffi.PlacesException%24InternalPanic%3A%20at%20mozilla.appservices.places.uniffi.PlacesException%24ErrorHandler.read%28places.kt%3A3%29&version=99.0a1&date=%3C2022-02-22T20%3A57%3A44%2B00%3A00&date=%3E%3D2022-02-15T20%3A57%3A44%2B00%3A00

grigoryk avatar Feb 22 '22 20:02 grigoryk

Any ideas on a repro here? I kinda assumed it must be when there is nothing to delete, but I tried hitting the "bin" icon in the history screen multiple times and it worked. I also added the following patch and the test still passes, so I'm not sure how to move forward here.

diff --git components/places/src/storage/history.rs components/places/src/storage/history.rs
index 073ab262..33960a90 100644
--- components/places/src/storage/history.rs
+++ components/places/src/storage/history.rs
@@ -2562,20 +2562,24 @@ mod tests {
     #[test]
     fn test_delete_everything() {
         use crate::storage::bookmarks::{
             self, BookmarkPosition, BookmarkRootGuid, InsertableBookmark,
         };
         use url::Url;
         let _ = env_logger::try_init();
         let mut conn = PlacesDb::open_in_memory(ConnectionType::ReadWrite).unwrap();
         let start = Timestamp::now();

+        // make sure delete_everything works with an empty DB.
+        delete_everything(&conn).expect("Should delete everything");
+        delete_everything(&conn).expect("Should delete everything twice!");
+

mhammond avatar Feb 23 '22 02:02 mhammond

Related question: are the current errors still reporting Unexpected error: Error executing SQL: Query returned no rows? I'm not seeing that in the crash reports, but it also is listing them as redacted. Maybe I'm not seeing the full context or maybe this is a different error than the original one.

bendk avatar Feb 23 '22 21:02 bendk

FWIW, this is a rusqlite specific error that means we are calling a function, like query_row that doesn't return an Option<> - ie, it's to help with type safety. This still doesn't help me identify where we are doing it as part of delete_everything() though.

Maybe I'm not seeing the full context or maybe this is a different error than the original one.

Good question! I've the same issue and asked on slack how to get access.

crash-stats also says it's mostly happening soon after startup, so it sounds unlikely users would start the browser then immediately delete all their history? It's also unclear why this is crashing - a panic isn't ideal, but doesn't cause a crash, and places is now using parking-lot, so shouldn't even cause poison errors.

tl;dr, I don't see how we can action this yet :(

mhammond avatar Mar 02 '22 05:03 mhammond

Digging in here a little, the only place this error could be popping up is happening when we calculate a new frecency after wiping. The trail as I followed it is: delete_everything() -> wipe_local_in_tx -> [frecency stack] -> FrecencyComputation::new which calls this: https://github.com/mozilla/application-services/blob/62bef6a0d49f8ab17b969e8ce482ac12c53f0987/components/places/src/frecency.rs#L153

Since as Mark mentioned, we don't actually call query_row in delete_everything which seems the only other candidate for the Query returned no rows error. I'll see if I can find a way to specifically trigger that piece of code but wanted to brain dump here for now.

skhamis avatar Mar 22 '22 22:03 skhamis

Yeah, thanks Sammy - I missed this place where we explicitly return that error, and while that function does seem likely, it's still possible that it's being raised by rusqlite itself.

Best I can tell though, it should still be "impossible" for FrecencyComputation::new to ever return that value in practice - it is being passed a rowId, which it got after the transaction has started from the DB itself.

I wonder if this is a hiesenbug - eg, you could imagine a corrupt DB causing this. I wonder if we need a better yardstick for trying to work out when a bug is worth digging deeper into - a bit like some of the other bugs, the fact it is a "top crasher" might just mean we should stop crashing on db error :)

mhammond avatar Mar 29 '22 03:03 mhammond

Hmmm Looks like reason this is a crash is https://github.com/mozilla-mobile/android-components/blob/62fe384048d6c4048c0ddd60befa1158774b3589/components/browser/storage-sync/src/main/java/mozilla/components/browser/storage/sync/PlacesHistoryStorage.kt#L159 doesn't do the "handlePlacesExceptions" dance to catch exceptions

Although it's a low frequency, it's a crasher so I'll put up a patch in a-c to handle the exception. Maybe there is a layer of abstraction missing.

Also wanted to explicitly note that although the error's name is "InternalPanic", there is no panic, it's the same problem we solved in #4941

tarikeshaq avatar May 05 '22 19:05 tarikeshaq

The exception handling for the crash landed in https://github.com/mozilla-mobile/android-components/pull/12112 I'll keep this ticket open since we didn't find the root cause the error is thrown, but I'll change the title to reflect that it shouldn't crash anymore

tarikeshaq avatar May 09 '22 18:05 tarikeshaq

I looked into this one a bit more today. The error is still happening, but it was hard for me to track it down in sentry. Eventually I found this issue where some of the events are from this error. I think @mhammond is right that this shouldn't happen because we're in a transaction and we've already read the rows from the table. However, I do have one guess at what's happening:

  • We open the DB using SQLITE_OPEN_NO_MUTEX
  • According to the SQLite docs, this is only safe as long as no database connections are shared between threads
  • I don't think there's anything stopping Kotlin from sharing a PlacesConnection between 2 threads and if one deletes rows while the other is in the middle of the transaction, then maybe that could cause this error.

I added some code in #4979 to try to track what's happening using Sentry breadcrumbs and also use a fingerprint to properly group the events into a single issue. Maybe that will let us figure out what's happening here.

bendk avatar Jun 01 '22 18:06 bendk

  • I don't think there's anything stopping Kotlin from sharing a PlacesConnection between 2 threads

I think the mutex around our PlacesDb should prevent different threads concurrently using a connection. It is true that the same connection might end up being used on different threads at different times though, and it's not immediately clear from those docs if that's a problem, but I suspect not as we have used that same basic model from day-one in places.

(I'm on board with instrumenting things to get to the bottom of this, but don't see how this specific theory could account for it though)

mhammond avatar Jun 02 '22 01:06 mhammond

Moved to bugzilla: https://bugzilla.mozilla.org/show_bug.cgi?id=1866352

Change performed by the Move to Bugzilla add-on.

mhammond avatar Nov 23 '23 22:11 mhammond