anax
anax copied to clipboard
Bug: NMP statuses not removed from the exchange after node no longer matches
Describe the bug.
No response
Describe the steps to reproduce the behavior.
No response
Expected behavior.
No response
Screenshots.
No response
Operating Environment
any
Additional Information
No response
@MaxMcAdam - Tried to verify the defect today on hzn version 913.
root@tbsK3agent1:~/444-agentfiles# hzn version Horizon CLI version: 2.30.0-913 Horizon Agent version: 2.30.0-913
Node being used is a k3s edge cluster agent: tbsk3agent
We were using this nmp file: cat tbs-cert-upgrade.json { "label": "tbs test in kmb-org", "description": "tbs cert upgrade", "enabled": true, "constraints": [ "openhorizon.example == "operator"" ], "start": "now", "startWindow": 0, "agentUpgradePolicy": { "manifest": "cert102", "allowDowngrade": false }
Karen ran an upgrade to the tbsk3agent node and checked the nmp status: (node was already at latest level - so no action required : nmp status reflects that at this point: hzn ex nmp status tbsupgrade -u root/root:glkPRbwwFbvGZThtlnHJOKgVMJMOax { "kmb-org/tbsK3agent1": "no action required"
hzn eventlog list from the node:
New node management policy status created for policy kmb-org/tbsupgrade.", "2022-06-23 19:04:46: Node management status for kmb-org/tbsupgrade changed to download started.", "2022-06-23 19:04:46: Node management status for kmb-org/tbsupgrade changed to no action required.",
We then removed a node property from the tbsk3sagent node. I removed node property openhorizon.example=operator. The active agreement was taken down - see eventlog below:
2022-06-23 19:12:45: Node policy updated with the Exchange copy: map[deployment:map[properties:
So - at this point, the nmp for this node no longer matches. The nmp status should be removed.
We checked the nmp with dryrun to confirm the nmp no longer matched the node:
root@kmbt21:~# hzn ex nmp add -f tbs-cert-upgrade.json tbsupgrade --dry-run --applies-to []
This cmd still shows that nmp status is present for the node (we waited approx 20 mins, checking periodically.
root@kmbt21:~# hzn ex nmp status tbsupgrade { "kmb-org/tbsK3agent1": "no action required"
I then added the node property openhorizon.examaple==operator back into the node properties via the UI.
"2022-06-23 19:34:36: Node policy updated with the Exchange copy: map[constraints:
So - now the nmp policy matches the node again - as shown below by the nmp add --dry-run cmd:
hzn ex nmp add -f tbs-cert-upgrade.json tbsupgrade --dry-run --applies-to [ "kmb-org/tbsK3agent1"
And.... now the nmp status has been removed:
root@kmbt21:~# hzn ex nmp status tbsupgrade Error: Status for NMP tbsupgrade not found in org kmb-org
Seems like the nmp status is not being keyed on the correct action to remove the status when the existing agreement is torn down and the properties no longer match the nmp.
It appears nmp status is not getting cleared until the node comes back up and forms a new agreement.
Karen is familar with this setup, and still has the node connected to her org, in case you need more info or another recreate.
@tbsloan it takes a minute or two to have the effect. Can you just do one change and wait for the result and see if it is correct?
@linggao @MaxMcAdam Karen and I retested this nmp status removal scenario again on Wed.
We found that it appears the removal of the nmp status for a node is triggered, based on 'management' properties being defined, such as in the nmp below:
{ "management": { "properties": [ { "name": "tbsnode", "value": "manageme" } ] } }
Using the nmp above, we saw that the nmp status was removed for that node after a short time, once the nmp policy no longer matched for that node.
The original issue we noticed still exists, if a higher-level node property is defined in the nmp, like in this nmp policy file:
{ "label": "tbs test in kmb-org", "description": "tbs cert upgrade", "enabled": true, "constraints": [ "openhorizon.example == "operator"" ], "start": "now", "startWindow": 0, "agentUpgradePolicy": { "manifest": "cert102", "allowDowngrade": false }
Using the nmp above (which does not contain explicit 'mamagement' properties, the nmp status is still not cleared, once the nmp no longer matches for the target node. This is likely working as designed and is probably OK for now.
For future function in this area, if an nmp matches a node on the --appliesTo dryrun test, it would be nice if the nmp status could be removed for that node, once a node property is removed and the nmp policy no longer matches for that node.