Create a plan for persistent storage upgrades
Upgrading postgresql releases over an existing installation has been painful for years due to:
- machine-specific representations stored on disk (big-/little-endian, floating point)
- the inability for new postgresql releases to read the older formats
- extension upgrades
These pain points are somewhat alleviated by pg_upgrade since rewriting the entire database is often untenable for ever-increasingly large data sets.
To ensure that Project:M36 doesn't hit this same hurdle, we should plan on making upgrades to binary releases as smooth as possible.
Considerations:
- minimize downtime
- minimize steps (automate away upgrade process)
- don't rewrite the database by default
- do allow the user to choose to rewrite the database in case the user wishes to continue to use the older release (pg_upgrade's hard link or copy dichotomy)
- live, in-place upgrades (?)
Currently, I am thinking that we should choose some platform-agnostic storage such as protocol buffers and plenty of metadata as the header of each file. More research into the options is required.
The upgrade process is simplified because Project:M36 only operates in write-once-read-many fashion, so a large relation could potentially be backed by files from different release versions. No rewrite or munging of old data should be required.
Regarding the live upgrades, it would be difficult to imagine how this would be possible with Haskell without language-based upgrade features such as available in Erlang. Instead, since Project:M36 uses WORM files, it should be possible to "pause" the older version which can still operate in read-only mode, while the new version continues to write files. Still, some proxy would be required to flip the switch on the ports for incoming clients.
Simon Marlow has done work on hot swapping for Haxl. It might be worth looking into it.
@3noch, I found a reference to "hot-code swapping" in slide 78 of the Haxl presentation, but the open-source implementation doesn't include anything related to this. The slide claims that Marlow had to modify GHC's built-in linker which seems very invasive.
Do you have any additional links or information?
Sadly no, but I wonder if Marlow plans on adding the changes to GHC itself. Perhaps you could ping ghc-devs about it.