freeipa-healthcheck
freeipa-healthcheck copied to clipboard
Intermittent replication errors when running ipa-healthcheck
Issue
Intermittent replication errors when running ipa-healthcheck. Running ipa-healthcheck every x minutes provides unreliable ReplicationChecks results. From what I've read on https://access.redhat.com/solutions/359683, getting a "replica is busy" is considered "normal". This make it difficult to monitor for actual replication errors.
Actual behaviour
{
"source": "ipahealthcheck.ds.replication",
"check": "ReplicationCheck",
"result": "ERROR",
"uuid": "94548c4b-ca49-4f8a-bd2e-1953fba9f767",
"when": "20230103141508Z",
"duration": "0.304435",
"kw": {
"key": "DSREPLLE0003",
"items": [
"Replication",
"Agreement"
],
"msg": "The replication agreement (ipa-2.test.io-to-ipa-3.test.io) under \"dc=test,dc=io\" is not in synchronization.\nStatus message: error (1) can't acquire busy replica (unable to acquire replica: the replica is currently being updated by another supplier.)"
}
Similar to the above error can happen intermittently on every freeipa server on a 3 node cluster. There aren't any replication errors most of the time.
Expected behavior
It should not report an error. A warning would be more suitable.
Version/Release/Distribution
Rocky Linux 8.6
Source : ipa-healthcheck-0.7-14.module+el8.7.0+1075+05db0c1d.src.rpm (latest available)
FreeIPA: 4.9
This check is provided by 389 itself. I suppose we could consider reducing the severity to WARNING but I'd leave that as a call to them. @mreynolds389 what do you think?
This check is provided by 389 itself. I suppose we could consider reducing the severity to WARNING but I'd leave that as a call to them. @mreynolds389 what do you think?
Well it is a transient error. Replication is just busy at that time. If you run it again in a few seconds it will probably pass. For us we already set it to a "medium" severity.
Thanks both for replying!
Yes it's a transient error. We run ipahealthcheck_exporter which basically scrapes ipa-healthcheck logs every 5 minutes. Can you suggest an alternative way of verifying replication health?
@mreynolds389 you mentioned you set it to "medium" severity, could I ask how?
Thanks both for replying!
Yes it's a transient error. We run ipahealthcheck_exporter which basically scrapes ipa-healthcheck logs every 5 minutes. Can you suggest an alternative way of verifying replication health?
@mreynolds389 you mentioned you set it to "medium" severity, could I ask how?
Well IPA is using DS's lib389 library for the DS healthchecks. IPA does not use DS's healthecheck severity level - it is ignored because there are basically two tools that were merged.
@rcritten Since IPA does not use DS's healthcheck severity level could this checks severity level be lowered to WARNING in IPA?
healthcheck doesn't ignore the DS severity. It converts it. See https://github.com/freeipa/freeipa-healthcheck/issues/283#issuecomment-2111803800
"medium" from DS is converted into a ipa-healthcheck ERROR severity.
healthcheck doesn't ignore the DS severity. It converts it. See #283 (comment)
"medium" from DS is converted into a ipa-healthcheck ERROR severity.
Thanks for clarifying. Do we want to set this specific check's severity to WARNING bypassing the conversion? As mentioned it is a transient error but it is still triggering a ERROR severity.
I suppose it's possible but it would be an ugly one-off. healthcheck has a rather thin wrapper to call the 389 checks and then re-format the return value. It's very generic code. It would be invasive to put in a test for a specific check.
I looked at the code and would assume as much and I tend to agree. Currently we exclude this specific check since we can't really "trust" the ERROR trigger.