rabbitmq-server icon indicating copy to clipboard operation
rabbitmq-server copied to clipboard

Add opt in initial check run

Open SimonUnge opened this issue 4 months ago • 1 comments

Proposed Changes

This PR introduces a new optional feature to help detect potential data loss scenarios during RabbitMQ node startup. The feature adds a verify_initial_run configuration option that defaults to false. When enabled, nodes will create a marker file called node_initialized.marker on their first startup and use this to verify data consistency on subsequent restarts.

The implementation works by adding a new boot step called initial_run_check that runs after recovery but before the existing empty_db_check step.

If the marker file exists but database tables are empty, this indicates a potential data loss scenario such as corruption, accidental database resets, or split-brain recovery issues. In these cases, the node will fail to start with a specific error cluster_already_initialized_but_tables_empty, giving operators clear indication that manual intervention may be required rather than silently starting with empty data.

Types of Changes

What types of changes does your code introduce to this project? Put an x in the boxes that apply

  • [ ] Bug fix (non-breaking change which fixes issue #NNNN)
  • [x] New feature (non-breaking change which adds functionality)
  • [ ] Breaking change (fix or feature that would cause an observable behavior change in existing systems)
  • [ ] Documentation improvements (corrections, new content, etc)
  • [ ] Cosmetic change (whitespace, formatting, etc)
  • [ ] Build system and/or CI

Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask on the mailing list. We're here to help! This is simply a reminder of what we are going to look for before merging your code.

  • [x] I have read the CONTRIBUTING.md document
  • [x] I have signed the CA (see https://cla.pivotal.io/sign/rabbitmq)
  • [x] I have added tests that prove my fix is effective or that my feature works
  • [x] All tests pass locally with my changes
  • [x] If relevant, I have added necessary documentation to https://github.com/rabbitmq/rabbitmq-website
  • [x] If relevant, I have added this change to the first version(s) in release-notes that I expect to introduce it

Further Comments

It the testing section, I fairly aggressively delete schema files to force rabbit_table:needs_default_data() returns true. There might be more elegant solutions?

SimonUnge avatar Jun 16 '25 18:06 SimonUnge