wazuh
wazuh copied to clipboard
Request group hashes recalculation in `Agent-groups send full` cluster task
Wazuh version | Component |
---|---|
4.6+ | Wazuh cluster |
[!NOTE] The branch fix/23422-groups-hash (https://github.com/wazuh/wazuh/issues/23422#issuecomment-2111249434) should be used as the base branch.
Description
We found the reason (at least, one of the reasons) behind the constant triggering of the Agent-groups send full
task that was happening in some environments:
2024/02/11 16:28:12 INFO: [Worker wazuh-manager-worker-2] [Agent-groups send full] Finished in 2.770s. Updated 11 chunks.
2024/02/11 16:28:14 INFO: [Worker wazuh-manager-worker-7] [Agent-groups send full] Finished in 2.263s. Updated 11 chunks.
2024/02/11 16:28:20 INFO: [Worker wazuh-manager-worker-14] [Agent-groups send full] Finished in 1.231s. Updated 11 chunks.
2024/02/11 16:28:21 INFO: [Worker wazuh-manager-worker-4] [Agent-groups send full] Finished in 1.962s. Updated 11 chunks.
As explained in https://github.com/wazuh/wazuh/issues/23422, if the group_hash
column of any of the agents becomes empty after any problem during group assignment, the global group hash would never be correct. This made the global hash of master and worker different all the time, creating the infinite Agent-groups send full
loop:
2024/05/14 23:02:06 DEBUG: [Worker worker01-node] [Agent-groups recv] The checksum of master (4839dba4cbbb040c70be6cdcd0a61e28279ec796) and worker (be5508d938177c1c25938fdfaa328ba4bb021037) are different.
2024/05/14 23:03:06 DEBUG: [Worker worker01-node] [Agent-groups recv] The checksum of master (4839dba4cbbb040c70be6cdcd0a61e28279ec796) and worker (be5508d938177c1c25938fdfaa328ba4bb021037) are different.
2024/05/14 23:03:16 DEBUG: [Worker worker01-node] [Agent-groups recv] The checksum of master (4839dba4cbbb040c70be6cdcd0a61e28279ec796) and worker (be5508d938177c1c25938fdfaa328ba4bb021037) are different.
2024/05/14 23:03:26 DEBUG: [Worker worker01-node] [Agent-groups recv] The checksum of master (4839dba4cbbb040c70be6cdcd0a61e28279ec796) and worker (be5508d938177c1c25938fdfaa328ba4bb021037) are different.
The only way to recalculate the hash for the affected agent(s) was to modify (assign or unassign) a group, but the core team is now adding a new wazuh-db command to force recalculating said hash:
global recalculate-agent-group-hashes
To do
We need to modify the Agent-groups send/recv full
task in both the master and the workers so that during its execution, the command to recalculate the groups hash (global recalculate-agent-group-hashes
) is sent to wazuh-db.
The full sync task does not use or compare hashes. However, as it is intended to run only exceptionally when the master-worker hashes have been different 5 times in a row, it is the best place to recalculate such information.
AsyncWazuhDBConnection().run_wdb_command
can be used to send the command.
Checks
The following elements have been updated or reviewed (should also be checked if no modification is required):
- [x] Tests (unit tests, API integration tests).
- [x] Changelog.
- [x] Documentation.
- [x] Integration test mapping (using
api/test/integration/mapping/_test_mapping.py
).
Issue Update
The methods suggested in the issue's description were modified to recalculate the group hashes. The following scenarios were reproduced to manually test the changes:
Newly added worker
Added a new worker to a cluster with a master and a single worker. The worker requires a full agent-groups
synchronization due to its initial state.
New worker logs
2024/05/15 11:29:55 INFO: [Worker node02] [Agent-groups recv] Starting.
2024/05/15 11:29:55 DEBUG: [Worker node02] [Agent-groups recv] 1/1 chunks updated in wazuh-db in 0.001s.
2024/05/15 11:29:55 DEBUG: [Worker node02] [Agent-groups recv] Obtained 1 chunks of data in 0.001s.
2024/05/15 11:29:55 DEBUG: [Worker node02] [Agent-groups recv] The checksum of master (0908f8f03645258f0ba4143db85fa5bab4d9f929) and worker (None) are different.
2024/05/15 11:29:55 DEBUG: [Worker node02] [Agent-groups recv] Checksum comparison failed (1/5).
2024/05/15 11:29:55 INFO: [Worker node02] [Agent-groups recv] Finished in 0.016s. Updated 1 chunks.
2024/05/15 11:30:05 INFO: [Worker node02] [Agent-groups recv] Starting.
2024/05/15 11:30:05 DEBUG: [Worker node02] [Agent-groups recv] 1/1 chunks updated in wazuh-db in 0.000s.
2024/05/15 11:30:05 DEBUG: [Worker node02] [Agent-groups recv] Obtained 1 chunks of data in 0.001s.
2024/05/15 11:30:05 DEBUG: [Worker node02] [Agent-groups recv] The checksum of master (0908f8f03645258f0ba4143db85fa5bab4d9f929) and worker (None) are different.
2024/05/15 11:30:05 DEBUG: [Worker node02] [Agent-groups recv] Checksum comparison failed (2/5).
2024/05/15 11:30:05 INFO: [Worker node02] [Agent-groups recv] Finished in 0.007s. Updated 1 chunks.
2024/05/15 11:30:15 INFO: [Worker node02] [Agent-groups recv] Starting.
2024/05/15 11:30:15 DEBUG: [Worker node02] [Agent-groups recv] 1/1 chunks updated in wazuh-db in 0.000s.
2024/05/15 11:30:15 DEBUG: [Worker node02] [Agent-groups recv] Obtained 1 chunks of data in 0.001s.
2024/05/15 11:30:15 DEBUG: [Worker node02] [Agent-groups recv] The checksum of master (0908f8f03645258f0ba4143db85fa5bab4d9f929) and worker (None) are different.
2024/05/15 11:30:15 DEBUG: [Worker node02] [Agent-groups recv] Checksum comparison failed (3/5).
2024/05/15 11:30:15 INFO: [Worker node02] [Agent-groups recv] Finished in 0.007s. Updated 1 chunks.
2024/05/15 11:30:25 INFO: [Worker node02] [Agent-groups recv] Starting.
2024/05/15 11:30:25 DEBUG: [Worker node02] [Agent-groups recv] 1/1 chunks updated in wazuh-db in 0.000s.
2024/05/15 11:30:25 DEBUG: [Worker node02] [Agent-groups recv] Obtained 1 chunks of data in 0.000s.
2024/05/15 11:30:25 DEBUG: [Worker node02] [Agent-groups recv] The checksum of master (0908f8f03645258f0ba4143db85fa5bab4d9f929) and worker (None) are different.
2024/05/15 11:30:25 DEBUG: [Worker node02] [Agent-groups recv] Checksum comparison failed (4/5).
2024/05/15 11:30:25 INFO: [Worker node02] [Agent-groups recv] Finished in 0.005s. Updated 1 chunks.
2024/05/15 11:30:35 INFO: [Worker node02] [Agent-groups recv] Starting.
2024/05/15 11:30:35 DEBUG: [Worker node02] [Agent-groups recv] 1/1 chunks updated in wazuh-db in 0.000s.
2024/05/15 11:30:35 DEBUG: [Worker node02] [Agent-groups recv] Obtained 1 chunks of data in 0.001s.
2024/05/15 11:30:35 DEBUG: [Worker node02] [Agent-groups recv] The checksum of master (0908f8f03645258f0ba4143db85fa5bab4d9f929) and worker (None) are different.
2024/05/15 11:30:35 DEBUG: [Worker node02] [Agent-groups recv] Checksum comparison failed (5/5).
2024/05/15 11:30:35 INFO: [Worker node02] [Agent-groups recv] Sent request to obtain all agent-groups information from the master node.
2024/05/15 11:30:35 INFO: [Worker node02] [Agent-groups recv] Finished in 0.008s. Updated 1 chunks.
2024/05/15 11:30:35 INFO: [Worker node02] [Agent-groups recv full] Starting.
2024/05/15 11:30:35 DEBUG: [Worker node02] [Agent-groups recv full] 1/1 chunks updated in wazuh-db in 0.001s.
2024/05/15 11:30:35 INFO: [Worker node02] [Agent-groups recv full] Finished in 0.005s. Updated 1 chunks.
2024/05/15 11:30:45 INFO: [Worker node02] [Agent-groups recv] Starting.
2024/05/15 11:30:45 DEBUG: [Worker node02] [Agent-groups recv] 1/1 chunks updated in wazuh-db in 0.000s.
2024/05/15 11:30:45 DEBUG: [Worker node02] [Agent-groups recv] Obtained 1 chunks of data in 0.000s.
2024/05/15 11:30:45 DEBUG: [Worker node02] [Agent-groups recv] The checksum of both databases match.
Master logs
2024/05/15 13:30:35 INFO: [Worker node02] [Agent-groups send full] Starting.
2024/05/15 13:30:35 DEBUG: [Worker 43f8c0434638] [Agent-groups send full] Recalculating agent-group hash.
2024/05/15 13:30:35 DEBUG: [Worker node02] [Agent-groups send full] Obtained 1 chunks of data in 0.001s.
2024/05/15 13:30:35 DEBUG: [Worker node02] [Agent-groups send full] Sending chunks.
2024/05/15 13:30:35 INFO: [Worker node02] [Agent-groups send full] Finished in 7200.011s. Updated 1 chunks.
Manually setting the group_hash
to NULL
Used a Python script to directly send a command to WDB setting the group_hash
to NULL in the Master.
root@master:/# python3 wdb-query.py "global sql select id,group_hash,group_sync_status from agent"
[
{
"id": 0,
"group_sync_status": "synced"
},
{
"id": 1,
"group_hash": "de62f85e",
"group_sync_status": "synced"
}
]
root@master:/# python3 wdb-query.py "global sql update agent set group_hash=Null where id=1"
[]
root@master:/# python3 wdb-query.py "global sql select id,group_hash,group_sync_status from agent"
[
{
"id": 0,
"group_sync_status": "synced"
},
{
"id": 1,
"group_sync_status": "synced"
}
]
Worker logs
2024/05/15 14:26:41 INFO: [Worker 43f8c0434638] [Agent-groups recv] Starting.
2024/05/15 14:26:41 DEBUG: [Worker 43f8c0434638] [Agent-groups recv] 1/1 chunks updated in wazuh-db in 0.000s.
2024/05/15 14:26:41 DEBUG: [Worker 43f8c0434638] [Agent-groups recv] Obtained 1 chunks of data in 0.001s.
2024/05/15 14:26:41 DEBUG: [Worker 43f8c0434638] [Agent-groups recv] The checksum of master (None) and worker (0908f8f03645258f0ba4143db85fa5bab4d9f929) are different.
2024/05/15 14:26:41 DEBUG: [Worker 43f8c0434638] [Agent-groups recv] Checksum comparison failed (1/5).
2024/05/15 14:26:41 INFO: [Worker 43f8c0434638] [Agent-groups recv] Finished in 0.011s. Updated 1 chunks.
2024/05/15 14:26:51 INFO: [Worker 43f8c0434638] [Agent-groups recv] Starting.
2024/05/15 14:26:51 DEBUG: [Worker 43f8c0434638] [Agent-groups recv] 1/1 chunks updated in wazuh-db in 0.000s.
2024/05/15 14:26:51 DEBUG: [Worker 43f8c0434638] [Agent-groups recv] Obtained 1 chunks of data in 0.001s.
2024/05/15 14:26:51 DEBUG: [Worker 43f8c0434638] [Agent-groups recv] The checksum of master (None) and worker (0908f8f03645258f0ba4143db85fa5bab4d9f929) are different.
2024/05/15 14:26:51 DEBUG: [Worker 43f8c0434638] [Agent-groups recv] Checksum comparison failed (2/5).
2024/05/15 14:26:51 INFO: [Worker 43f8c0434638] [Agent-groups recv] Finished in 0.006s. Updated 1 chunks.
2024/05/15 14:27:01 INFO: [Worker 43f8c0434638] [Agent-groups recv] Starting.
2024/05/15 14:27:01 DEBUG: [Worker 43f8c0434638] [Agent-groups recv] 1/1 chunks updated in wazuh-db in 0.000s.
2024/05/15 14:27:01 DEBUG: [Worker 43f8c0434638] [Agent-groups recv] Obtained 1 chunks of data in 0.001s.
2024/05/15 14:27:01 DEBUG: [Worker 43f8c0434638] [Agent-groups recv] The checksum of master (None) and worker (0908f8f03645258f0ba4143db85fa5bab4d9f929) are different.
2024/05/15 14:27:01 DEBUG: [Worker 43f8c0434638] [Agent-groups recv] Checksum comparison failed (3/5).
2024/05/15 14:27:01 INFO: [Worker 43f8c0434638] [Agent-groups recv] Finished in 0.006s. Updated 1 chunks.
2024/05/15 14:27:11 INFO: [Worker 43f8c0434638] [Agent-groups recv] Starting.
2024/05/15 14:27:11 DEBUG: [Worker 43f8c0434638] [Agent-groups recv] 1/1 chunks updated in wazuh-db in 0.000s.
2024/05/15 14:27:11 DEBUG: [Worker 43f8c0434638] [Agent-groups recv] Obtained 1 chunks of data in 0.001s.
2024/05/15 14:27:11 DEBUG: [Worker 43f8c0434638] [Agent-groups recv] The checksum of master (None) and worker (0908f8f03645258f0ba4143db85fa5bab4d9f929) are different.
2024/05/15 14:27:11 DEBUG: [Worker 43f8c0434638] [Agent-groups recv] Checksum comparison failed (4/5).
2024/05/15 14:27:11 INFO: [Worker 43f8c0434638] [Agent-groups recv] Finished in 0.007s. Updated 1 chunks.
2024/05/15 14:27:21 INFO: [Worker 43f8c0434638] [Agent-groups recv] Starting.
2024/05/15 14:27:21 DEBUG: [Worker 43f8c0434638] [Agent-groups recv] 1/1 chunks updated in wazuh-db in 0.001s.
2024/05/15 14:27:21 DEBUG: [Worker 43f8c0434638] [Agent-groups recv] Obtained 1 chunks of data in 0.001s.
2024/05/15 14:27:21 DEBUG: [Worker 43f8c0434638] [Agent-groups recv] The checksum of master (None) and worker (0908f8f03645258f0ba4143db85fa5bab4d9f929) are different.
2024/05/15 14:27:21 DEBUG: [Worker 43f8c0434638] [Agent-groups recv] Checksum comparison failed (5/5).
2024/05/15 14:27:21 INFO: [Worker 43f8c0434638] [Agent-groups recv] Sent request to obtain all agent-groups information from the master node.
2024/05/15 14:27:21 INFO: [Worker 43f8c0434638] [Agent-groups recv] Finished in 0.009s. Updated 1 chunks.
2024/05/15 14:27:21 INFO: [Worker 43f8c0434638] [Agent-groups recv full] Starting.
2024/05/15 14:27:21 DEBUG: [Worker 43f8c0434638] [Agent-groups recv full] 1/1 chunks updated in wazuh-db in 0.001s.
2024/05/15 14:27:21 INFO: [Worker 43f8c0434638] [Agent-groups recv full] Finished in 0.005s. Updated 1 chunks.
2024/05/15 14:27:31 INFO: [Worker 43f8c0434638] [Agent-groups recv] Starting.
2024/05/15 14:27:31 DEBUG: [Worker 43f8c0434638] [Agent-groups recv] 1/1 chunks updated in wazuh-db in 0.001s.
2024/05/15 14:27:31 DEBUG: [Worker 43f8c0434638] [Agent-groups recv] Obtained 1 chunks of data in 0.002s.
2024/05/15 14:27:31 DEBUG: [Worker 43f8c0434638] [Agent-groups recv] The checksum of both databases match.
2024/05/15 14:27:31 INFO: [Worker 43f8c0434638] [Agent-groups recv] Finished in 0.008s. Updated 1 chunks.
Master logs
2024/05/15 14:27:21 INFO: [Worker 43f8c0434638] [Agent-groups send full] Starting.
2024/05/15 14:27:21 DEBUG: [Worker 43f8c0434638] [Agent-groups send full] Recalculating agent-group hash.
2024/05/15 14:27:21 DEBUG: [Worker 43f8c0434638] [Agent-groups send full] Obtained 1 chunks of data in 0.001s.
2024/05/15 14:27:21 DEBUG: [Worker 43f8c0434638] [Agent-groups send full] Sending chunks.
2024/05/15 14:27:21 INFO: [Worker 43f8c0434638] [Agent-groups send full] Finished in 7200.017s. Updated 1 chunks.
After the Agent-groups full task execution the group_hash
gets back to the expected value:
root@master:/# python3 wdb-query.py "global sql select id,group_hash,group_sync_status from agent"
[
{
"id": 0,
"group_sync_status": "synced"
},
{
"id": 1,
"group_hash": "de62f85e",
"group_sync_status": "synced"
}
]
Manually setting the agent-groups information and recalculating at the same time
Used the Python script to simultaneously overwrite the agent's group information and recalculate the corresponding hash in the Worker:
root@43f8c0434638:/# /var/ossec/framework/python/bin/python3 wdb.py 'global set-agent-groups {"mode": "override", "sync_status": "synced", "data":[{"id":1,"groups":["NewGroup_1","NewGroup_2"]}]}'
['ok']
root@43f8c0434638:/# python3 wdb-query.py "global recalculate-agent-group-hashes"
ok
Worker logs
2024/05/15 17:26:25 INFO: [Worker 43f8c0434638] [Agent-groups recv] Starting.
2024/05/15 17:26:25 DEBUG: [Worker 43f8c0434638] [Agent-groups recv] 1/1 chunks updated in wazuh-db in 0.001s.
2024/05/15 17:26:25 DEBUG: [Worker 43f8c0434638] [Agent-groups recv] Obtained 1 chunks of data in 0.001s.
2024/05/15 17:26:25 DEBUG: [Worker 43f8c0434638] [Agent-groups recv] The checksum of master (0908f8f03645258f0ba4143db85fa5bab4d9f929) and worker (d3eb723ee681659728363e65eeaf34f49a4778f1) are different.
2024/05/15 17:26:25 DEBUG: [Worker 43f8c0434638] [Agent-groups recv] Checksum comparison failed (1/5).
2024/05/15 17:26:25 INFO: [Worker 43f8c0434638] [Agent-groups recv] Finished in 0.007s. Updated 1 chunks.
2024/05/15 17:26:35 INFO: [Worker 43f8c0434638] [Agent-groups recv] Starting.
2024/05/15 17:26:35 DEBUG: [Worker 43f8c0434638] [Agent-groups recv] 1/1 chunks updated in wazuh-db in 0.001s.
2024/05/15 17:26:35 DEBUG: [Worker 43f8c0434638] [Agent-groups recv] Obtained 1 chunks of data in 0.001s.
2024/05/15 17:26:35 DEBUG: [Worker 43f8c0434638] [Agent-groups recv] The checksum of master (0908f8f03645258f0ba4143db85fa5bab4d9f929) and worker (d3eb723ee681659728363e65eeaf34f49a4778f1) are different.
2024/05/15 17:26:35 DEBUG: [Worker 43f8c0434638] [Agent-groups recv] Checksum comparison failed (2/5).
2024/05/15 17:26:35 INFO: [Worker 43f8c0434638] [Agent-groups recv] Finished in 0.008s. Updated 1 chunks.
2024/05/15 17:26:45 INFO: [Worker 43f8c0434638] [Agent-groups recv] Starting.
2024/05/15 17:26:45 DEBUG: [Worker 43f8c0434638] [Agent-groups recv] 1/1 chunks updated in wazuh-db in 0.000s.
2024/05/15 17:26:45 DEBUG: [Worker 43f8c0434638] [Agent-groups recv] Obtained 1 chunks of data in 0.001s.
2024/05/15 17:26:45 DEBUG: [Worker 43f8c0434638] [Agent-groups recv] The checksum of master (0908f8f03645258f0ba4143db85fa5bab4d9f929) and worker (d3eb723ee681659728363e65eeaf34f49a4778f1) are different.
2024/05/15 17:26:45 DEBUG: [Worker 43f8c0434638] [Agent-groups recv] Checksum comparison failed (3/5).
2024/05/15 17:26:45 INFO: [Worker 43f8c0434638] [Agent-groups recv] Finished in 0.005s. Updated 1 chunks.
2024/05/15 17:26:55 INFO: [Worker 43f8c0434638] [Agent-groups recv] Starting.
2024/05/15 17:26:55 DEBUG: [Worker 43f8c0434638] [Agent-groups recv] 1/1 chunks updated in wazuh-db in 0.001s.
2024/05/15 17:26:55 DEBUG: [Worker 43f8c0434638] [Agent-groups recv] Obtained 1 chunks of data in 0.000s.
2024/05/15 17:26:55 DEBUG: [Worker 43f8c0434638] [Agent-groups recv] The checksum of master (0908f8f03645258f0ba4143db85fa5bab4d9f929) and worker (d3eb723ee681659728363e65eeaf34f49a4778f1) are different.
2024/05/15 17:26:55 DEBUG: [Worker 43f8c0434638] [Agent-groups recv] Checksum comparison failed (4/5).
2024/05/15 17:26:55 INFO: [Worker 43f8c0434638] [Agent-groups recv] Finished in 0.005s. Updated 1 chunks.
2024/05/15 17:27:05 INFO: [Worker 43f8c0434638] [Agent-groups recv] Starting.
2024/05/15 17:27:05 DEBUG: [Worker 43f8c0434638] [Agent-groups recv] 1/1 chunks updated in wazuh-db in 0.001s.
2024/05/15 17:27:05 DEBUG: [Worker 43f8c0434638] [Agent-groups recv] Obtained 1 chunks of data in 0.000s.
2024/05/15 17:27:05 DEBUG: [Worker 43f8c0434638] [Agent-groups recv] The checksum of master (0908f8f03645258f0ba4143db85fa5bab4d9f929) and worker (d3eb723ee681659728363e65eeaf34f49a4778f1) are different.
2024/05/15 17:27:05 DEBUG: [Worker 43f8c0434638] [Agent-groups recv] Checksum comparison failed (5/5).
2024/05/15 17:27:05 INFO: [Worker 43f8c0434638] [Agent-groups recv] Sent request to obtain all agent-groups information from the master node.
2024/05/15 17:27:05 INFO: [Worker 43f8c0434638] [Agent-groups recv] Finished in 0.005s. Updated 1 chunks.
2024/05/15 17:27:05 INFO: [Worker 43f8c0434638] [Agent-groups recv full] Starting.
2024/05/15 17:27:05 DEBUG: [Worker 43f8c0434638] [Agent-groups recv full] 1/1 chunks updated in wazuh-db in 0.000s.
2024/05/15 17:27:05 INFO: [Worker 43f8c0434638] [Agent-groups recv full] Finished in 0.002s. Updated 1 chunks.
2024/05/15 17:27:15 INFO: [Worker 43f8c0434638] [Agent-groups recv] Starting.
2024/05/15 17:27:15 DEBUG: [Worker 43f8c0434638] [Agent-groups recv] 1/1 chunks updated in wazuh-db in 0.000s.
2024/05/15 17:27:15 DEBUG: [Worker 43f8c0434638] [Agent-groups recv] Obtained 1 chunks of data in 0.001s.
2024/05/15 17:27:15 DEBUG: [Worker 43f8c0434638] [Agent-groups recv] The checksum of both databases match.
2024/05/15 17:27:15 INFO: [Worker 43f8c0434638] [Agent-groups recv] Finished in 0.007s. Updated 1 chunks.
Master logs
2024/05/15 17:27:05 INFO: [Worker 43f8c0434638] [Agent-groups send full] Starting.
2024/05/15 17:27:05 DEBUG: [Worker 43f8c0434638] [Agent-groups send full] Recalculating agent-group hash.
2024/05/15 17:27:05 DEBUG: [Worker 43f8c0434638] [Agent-groups send full] Obtained 1 chunks of data in 0.000s.
2024/05/15 17:27:05 DEBUG: [Worker 43f8c0434638] [Agent-groups send full] Sending chunks.
2024/05/15 17:27:05 INFO: [Worker 43f8c0434638] [Agent-groups send full] Finished in 7200.005s. Updated 1 chunks.
root@43f8c0434638:/# python3 wdb-query.py "global sql select id,group_hash,group_sync_status from agent"
[
{
"id": 0,
"group_sync_status": "synced"
},
{
"id": 1,
"group_hash": "37a8eec1",
"group_sync_status": "synced"
}
]