neon icon indicating copy to clipboard operation
neon copied to clipboard

Epic: implement storage format backward compatibility test

Open LizardWizzard opened this issue 2 years ago • 9 comments

The test should fail if one changes storage format in a backward-incompatible way. One option is to generate some data in a current format, put it somewhere (e.g. s3), and load it during a test pipeline run.

We need this because otherwise, changes might slip unnoticed and break staging at minimum

LizardWizzard avatar May 02 '22 14:05 LizardWizzard

Initial work on this from @MMeent: https://github.com/neondatabase/neon/tree/feature/test-format-versions

IIUC, the code is white-box style; the test code has very intimate knowledge of the exact WAL record kinds that are handled in the pageserver. The idea is to run the tests with one version, store the generated files somewhere, and then run a different set of tests with a new version of the binaries, to check that it can read the files generated by the old version.

hlinnaka avatar Jul 01 '22 10:07 hlinnaka

Thinking about this some more, let me try to first define the goal more clearly:

Goal

Have a test, run automatically in CI, that fails if the new binary cannot read all layer files generated by all old binaries that have been deployed to production at any time.

The thing we want to avoid, is that we deploy a new binary to production, and it fails to read old data correctly. We want to catch that earlier, before the new binary reaches production.

How do we get there?

We need to:

  1. Have set of old layer files to test with
  2. Run some tests against those files, to verify that they are still readable

The 'test-format-versions' branch achieves 1. with Rust code that calls the pageserver functions to generate layer files. It achieves 2. with more Rust code that calls more pageserver functions to read the files.

I think that approach has the benefit that the set of layer files used is very small, so I expect the test to run very quickly, and the test files can easily be just included in the git repository. However, I'm worried that we will easily miss updating the tests, if new code is added to handle different kinds of WAL records. I'm also worried that this will be laborious to expand to cover things like branching, L0 and L1 layer files, having a mix of different versions etc.

As an alternative, or in addition to that, I think it would be good to do more black-box style testing. Start with an SQL script that generates a set of layer files. Run that with one set of binaries, and store the resulting files somewhere. Then launch a new binary against those files, and run a bunch of SQL scripts to verify that all the data is readable. We can use code coverage tests to verify that all the interesting codepaths are covered.

@bayandin, any thoughts on this? What would be the best approach for this?

hlinnaka avatar Jul 01 '22 10:07 hlinnaka

My approach so far has been a unit-testing approach, in which I assume that if we can extract all records from a file in a correct manner, then we can use that file.

What you seem to propose is to use an integration-testing approach, which would also cover the indexing of local files and maintaining the inter-compatibility of storing a whole Repository.

That doesn't seem impossible, but it uses a much broader interpretation of "storage format" than I'd previously used.

MMeent avatar Jul 01 '22 12:07 MMeent

@bayandin, any thoughts on this? What would be the best approach for this?

I'm probably still not familiar enough, so I can't really suggest what is the best way to unit-test it right now.

We need to:

  1. Have set of old layer files to test with
  2. Run some tests against those files, to verify that they are still readable

This makes sense.

The 'test-format-versions' branch achieves 1. and I think that approach has the benefit that the set of layer files used is very small, so I expect the test to run very quickly, and the test files can easily be just included in the git repository.

I just want to point out that the current example of a layer in that branch ~2MB, so I'd guess we won't want to have a lot of such files in the repo (unless we use something like git-lfs).

I'm worried that we will easily miss updating the tests, if new code is added to handle different kinds of WAL records.

Is there an enum of WAL types? If so, we can ensure (by testing this) that we have an example of a record for each type.

As an alternative, or in addition to that, I think it would be good to do more black-box style testing.

I love this idea!

Start with an SQL script that generates a set of layer files. Run that with one set of binaries, and store the resulting files somewhere.

I think we can run this script after each release (or maybe only after ones that change WAL handling) and collect more examples (store to s3). Then in PRs/before release, we can check that all the data collected before is readable and all codepaths are covered.

bayandin avatar Jul 04 '22 10:07 bayandin

I think we can run this script after each release (or maybe only after ones that change WAL handling) and collect more examples (store to s3).

+1, important versions to test against are the currently deployed staging and prod. So during the release, we can create a database, fill it with pgbench or pg_regress data, and export a logical dump (sorted pg_dump) and raw data files. Then on each PR, we can check that the current storage format can read that data, and Postgres running on top of it will produce the same logical dump.

kelvich avatar Jul 12 '22 06:07 kelvich

We've chatted with @MMeent about a plan here:

  • manually upload some pageserver data directory with one tenant (pg_dump of it) to the S3
  • with python tests fixtures, create a pageserver; swap data directory with one from S3; and start Postgres on top of it
  • run pg_dump | sort | md5 on a new compute and compare it to the old pg_dump's hash
  • or/and pg_basebackup against new pageserver
  • try running amcheck on the all amcheck-able objects
  • upload such pageserver data directory on each deploy

kelvich avatar Jul 12 '22 11:07 kelvich

pg_dump of it

Most probably I'm missing something, but did you mean the original dir as created by the pageserver? I do not get why we need pg_dump here. Can pg_dump data match even if on-disk format had changed?

LizardWizzard avatar Jul 13 '22 08:07 LizardWizzard

I was talking about pg_dump to make sure that we've read all the data properly. Just starting pageserver over the old data directory is not enough -- you won't read all the data. So I can think of the following:

  • We have pageserver data directory along with the output of pg_dumpall on the S3
  • To check that we can read this data directory, we start the new pageserver on top of it, start compute, run pg_dumpall and compare it with the stored dump. pg_dumpall does not guarantee the order of records, so we may need to sort output files to compare them.

kelvich avatar Jul 13 '22 08:07 kelvich

Gotcha, I ve read it as we upload only pg_dump, not the original datadir

LizardWizzard avatar Jul 13 '22 08:07 LizardWizzard

@bayandin is this on your plate ?

shanyp avatar Jul 19 '23 07:07 shanyp

from @bojanserafimov : https://github.com/neondatabase/neon/issues/1896

Run pageserver with small checkpoint distance, generate and store some tenant data inside the github repo (< 100KB zipped). Then write test that loads the data and answers some queries. The goal is to notice backwards-incompatible changes.

If there's a need for larger datasets we can think about external storage. But for now 100KB is not that much. layered_repository.rs is about 100KB, and so is Cargo.lock.

closing a https://github.com/neondatabase/neon/issues/1896 as a duplicate

shanyp avatar Jul 19 '23 07:07 shanyp

@bayandin is this on your plate ?

Hmm, I thought it's done.

@stepashka is there anything else we need to implement?

bayandin avatar Jul 19 '23 07:07 bayandin

is there anything else we need to implement?

Ah, sorry, I haven't noticed that it was reopened back in October and moved to Done in November (GitHub app doesn't show some events and dates).

I consider the issue is done. We have backward and forward compatibility tests. Please create a separate issue/PR if we need to add something else at this field

bayandin avatar Jul 19 '23 09:07 bayandin