configurable EC/fountain codes in separate segment track for gracefully surviving buggy hardware
Regardless of how reliable the core of the database is, when targeting mass-deployment phones and browsers the database will be corrupted due to bitrot. it would be cool to be able to gracefully recover and heal when this happens, for those who want to opt-into bitrot protection.
So the purpose of Erasure Coding / fountain codes is to recover data when a segment has been lost. If a segment has been corrupted, this can be detected (but not corrected) using CRCs.
If you write data to a piece of flash in a phone and the HW has corrupted a few segments then it's either up to the HW/firmware/OS/FS to repair it. And even if it does, the HW is likely on the way out so there's a good chance that this will be a losing proposition.
If you are doing EC on all the data in software (mobile chips don't generally have the specialized instructions for performing EC) then your battery is not going to be happy since it's computationally intensive. Further, you will incur a non-trivial latency due to reassembly.
Finally, it doesn't address the most common cause of losing data on phones (citation needed): losing/damaging the phone. If you lose the device then the data is gone.
Perhaps a better approach is to focus on remote replication.
@ehiggs I totally agree about all of the challenges you mention. This issue came about after a conversation with a firefox storage engineer who was mentioning how corruption was a challenging issue for large-scale deployments of their various storage engines, and it made me wonder if the trade-offs involved would be nice for some class of use cases (opt-in). I have a few local things where I'd like to know quickly if my disk is rotting, but they don't have a replication story yet. Feedback and alert-friendliness are a part of this story too.
I also believe replication is a higher priority than this, although this might be a bit easier to write, depending on how much locality avoidance is necessary. There may even be a relatively clean abstraction for using replication locally as a mechanism for achieving this (but paying a space price).