neon icon indicating copy to clipboard operation
neon copied to clipboard

Epic: upgrade to postgres 15

Open stepashka opened this issue 2 years ago • 4 comments

DoD: Neon supports both pg15 and pg14 compute nodes. Users may migrate their data manually using 'pg_dump/pg_restore' if they want to.

For simplicity, we assume that each project (timeline) always uses only one postgres version. Pageservers and safekeepers should support both versions simultaneously. Pageserver should chose wal-redo for each timeline depending on its version. The compute binary is chosen depending on the project's pg version.

@stepashka , @hlinnaka , @kelvich , How are we going to offer this feature to users? Will it be an option at project start? Or all new projects will use pg15 and pg14 support is only for existing projects?

Storage side issues:

  • rebase neon postgres changes to pg15 (#2240)
  • move neon extension to the neon repo
  • rename main branch in vendor/postgres to neon_14_stable
  • build neon with both neon_14_stable and neon_15_stable vendor/postgres branches.
  • support working with both versions in pageserver

Console/deploy side issues:

  • update deploy scripts to support both postgres versions
  • allow users to chose what postgres version to use. Add UI support

stepashka avatar Jun 22 '22 09:06 stepashka

We need to inform all clients in advance. Their apps may be incompatible with postgres 15.

The work that me and @bojanserafimov are doing in tenant migration epic partially covers the script part

lubennikovaav avatar Jul 05 '22 14:07 lubennikovaav

Current state:

  • Neon rebased to postgres 15 passes all the tests: https://github.com/neondatabase/neon/pull/2240
  • With Heikki's refactoring PRs we can build pageserver supporting both postgres versions simultaneously PR https://github.com/neondatabase/neon/pull/2250 (ready for merge) PR https://github.com/neondatabase/neon/pull/2161 (needs some polishing)

The remaining parts:

  • actually choose postgres version in pageserver to handle WAL and spin up correct wal-redo binary (WIP)
  • update CI scripts to build and deploy all needed binaries (WIP)
  • update UI and console to give user the choice of the version (?)
  • proxy/pooler changes (?)

All these things depend on our product decision.

Options are:

  • Mandatory upgrade all users to pg15. With this solution we don't have to maintain 2 versions simultaneously after migration. Requires storage migration. All WAL history will be lost after upgrade. I don't think that's a good choice. PG15 is not even officially released yet. That means that some libraries may not support it and so on. I bet that least some users want to use stable and familiar v14.

  • Spin all new clusters on pg15. Maintain pg14 only for already existing projects. Write good user documentaion about upgrade using pg_dump/pg_restore, so that users could do it themselves. This won't require any work on UI side, we already show postgres version on the dashboard page. Same concern about cutting edge technology. IMHO, Pg15 is a bit too new.

  • Allow users to choose postgres version on project creation. That would require one more field in the 'PROJECT CREATION' dialog. The pageserver works with both postgres versions, but each project uses just one pg version, defined at project creation. We are explicit about postgres version.

  • Support seamless upgrade and work with old WAL from new pg version. This is cool - we don't lose any history at upgrade, but technically more complicated than other solutions. Still we need to know what version of compute to spin up. Same questions about UI support as in other solutions.

lubennikovaav avatar Aug 18 '22 09:08 lubennikovaav

The current release branch of PG15 may still see feature reverts and/or catalog updates, so we should release PG15 to customers only after RC0 is out (otherwise we might need to dump+restore to upgrade from 15beta to 15.0, which is less than optimal).

MMeent avatar Aug 18 '22 09:08 MMeent

I don't think we can release PG15 beta 4 to customers, as there have been catalog updates since that was stamped. RC1 is probably going to be the first upstream tag we can supply to customers without a significant chance of losing customer data.

MMeent avatar Sep 07 '22 09:09 MMeent