Fix wrong backup check for SQL Always on High availability Groups
This is a recreation of https://github.com/Checkmk/checkmk/pull/518
Bug reports
operating system name and version SQL Always on High availability Groups on SQL Server 2019 Windows Server 2019 Detailed steps to reproduce the bug
- Create a Backup of a High Available Database on Server1
- Move the Database to Server2
- You get a backup Error because no actual backup is found
Proposed changes
What is the expected behavior? The Agent should return the Backup status, even if it is not the primary Server What is the observed behavior? The Agent does not return the Backup if it is not the Primary Replica If it's not obvious from the above: In what way does your patch change the current behavior? The Primary Server check is removed Is this a new problem? What made you submit this PR (new firmware, new device, changed device behavior)? The move of the database.
Hi @marcohald,
first of all thanks for re-opening and the contribution.
When I have a look at the history of mssql.vbs, I can find this werk: https://checkmk.com/werk/4394:
In previous versions the backups of availability group cluster slave hosts (were no backup is executed) was
handled as last backup age.
Now we exclude those backups from the monitoring and only handle the backups of the primary replica.
So your change basically reverts this (very old) werk. I cannot really tell at the moment if we would introduce a regression then - do you have an opinion about it?
We discussed the PR internally and decided to reject it. Here the reason for our decision: While the server 1 is moved to server 2 it means that the server 2 is becoming the primary replica and Checkmk will show an error as no backup is found. The server was migrated however the backup remains on server 1, so the status is valid (no backup yet on the new primary replica). Once the first backup will happen on the server 2, the issue should be gone.