alertmanager icon indicating copy to clipboard operation
alertmanager copied to clipboard

Draft Proposal: Write down specifications which will be used for acceptance tests

Open siavashs opened this issue 1 month ago • 1 comments

Alertmanager lack any form of "official specification", there are different behaviour which fall under these categories:

  • documented in the docs
  • tested in unit tests
  • tested in acceptance test
  • not tested

In some cases it is hard to understand the motivation behind specific logics since it is not well documented and commit messages lack context. These fall into a bucket of things that we are not sure if they are incidental or intentional and therefore can either be dropped or supported.

One example is the Aggregation Group timer resets to zero when an old alert arrives: https://github.com/prometheus/alertmanager/blob/80d0265e16874ab0faf7c4de83cd8e33ac03f23e/dispatch/dispatch.go#L499-L501 (This logic was introduced before clustering). Should such a logic be kept or removed?

Proposal

Start writing down specifications which can then be used to generate acceptance tests. Each component of Alertmanager will have a specification which it should satisfy. The Application and the cluster will also have specifications. The specification can evolve over time to support more features or deprecate and drop an unused or incidental one.

There are different solution to acheive this but one good example is https://cucumber.io/ Which also supports golang https://github.com/cucumber/godog

siavashs avatar Nov 14 '25 15:11 siavashs

Interesting, cucumber seems pretty neat.

I also wonder if we should actually try to write a TLA+ spec for notification algorithm... But that's going to be a pretty big task.

Spaceman1701 avatar Nov 26 '25 16:11 Spaceman1701