planetary-ios
planetary-ios copied to clipboard
StartError - database corruption
Sometimes go-ssb seems not shut down cleanly and when it reboots it cannot open its database. It looks like this for the user:
This is printed in the logs:
ts="2022-02-01 17:10:35.9673960 (UTC)" level=error event="bot init failed" err="BotInit: failed to make sbot instance: sbot: failed to open rootlog: failed to open log: offset2: integrity error: data file size difference -4042"
If you have experienced this error here are the steps you can take to recover your profile.
I traced this error message to the checkJournal() function in log.go. It sounds like the underlying marget log was not closed cleanly. What isn't clear to me is whether go-ssb is capable from restoring from this type of failure.
Sebastian experienced this issue today and I was able to get his logs and database. The error presented differently in the UI but the underlying error from the GoBot is the same. The database is large so I won't upload it to Github, but I can provide it upon request. Here are the logs, and here is the Bugsnag issue.
I'm marking this as blocked by #226. If this is still an issue we are seeing on the latest version of go-ssb, then I think we should have @boreq investigate at that point.
There is some new code that fixes similar database errors in go-ssb:
https://github.com/cryptoscope/ssb/blob/master/cmd/go-sbot/main.go#L264
This code also exists in the older version of go-ssb but we don't seem to be using it.
Edit: actually we seem to call similar code from Swift.
I just experienced this again on an old device I installed Planetary on years ago. Given that several people have seen this after launching Planetary after not using it for several months I'm starting to think this will be a problem for every Planetary user that fires up an identity they haven't touched for some time. My guess is that our current code is not interacting with an older database format correctly, as there is a lot of code scattered around for dealing with migrations.
This is probably something we want to prioritize before doing any marketing pushes that get old users to reopen Planetary. Even if we just give them the option to delete their database and resync from the network that would be better than what we have now, which is an infinite loop of pressing "Start Over" or "Try Again". CC @rabble @setch-l
Also I tried calling a function I found called fsckAndRepair() which seemed really promising but failed without a useful error message.
No update on this but we have prioritized #340 and #622 to make it easier to recover from these errors.
I am closing this as scuttlego uses badger and not margaret.