orb
orb copied to clipboard
Agent Reset should also reset Orb agent and not just backends
When requesting an agent reset, the Orb agent should request a fresh set of policies from the Orb control plane as part of the reset and before resetting and refreshing the backends.
A bug has been observed where a Dataset was deleted but the Orb agent still maintained that information and pushed it to pktvisor, even after an agent reset (observed on NS1 Orb agent 0.17.0-develop-77c9dd4 with ns1-pktvisor 4.2.0-develop-ec9d0ed).
I also observed it when a policy subscribed to a group was only applied to the agent after I restart the container via docker.. the reset was not enough to reset configurations.. fwii the agent was online and no error was displayed (was a healthy agent).
PR #1664
@lpegoraro I tried with new agent:
Agent Version: 0.19.0-develop-ec8fc03 Backend Version: 4.2.0-develop-df47d63
And i'm still facing a reset behavior issue. The problem was noticed after sending multiple reset requests.
On this first agent the group loses connection with the agent and, even so, the policy remains linked. In this case, I waited a while and tried to reset again and, even so, the expected behavior of the agent was not restored (only when I restarted the container did everything go back to how it should be)
Full logs:
_orb-agent-int-testXqjde_logs.txt
On this second agent: first the policy failed and the group lost connection with the agent.. then the policy was online.. but the group was still lost.. after a while I reset the agent and besides the group did not come back, the policy was with status unknown
Full logs:
Steps to reproduce:
- Apply this policy to an agent (this policy will fail because
exclude_noerror
exists but s not a boolean).
{
"handlers": {
"modules": {
"handler_dns_1": {
"filter": {
"exclude_noerror": "",
"only_qname_suffix": [
""
]
},
"type": "dns"
}
}
},
"input": {
"input_type": "pcap",
"tap": "default_pcap"
},
"kind": "collection"
}
- Remove (delete) the policy from Orb (it can be done through the API or the orb UI)
- Check agent view page or agent backends (related with #1342)
- Reset the agent remotely
Expected result:
- The policy must be unattached from agent last heartbeat because the policy does not exist anymore.
Current result:
- Policy keeps attached
Full logs:
@lpegoraro For this problem where the policy with error remains attached to the agent even after restart: I think the issue is that we reapply all the policies of policy repo on agent, and this happens before the agent restart process finishes which will fetch all policies we need to apply anyway. We can delete all policies from policy repo here changing to true https://github.com/ns1labs/orb/blob/d21fa1ae01829d8a5d68c7a35e8f7f7476eabd27/agent/agent.go#L212
and remove this block https://github.com/ns1labs/orb/blob/d21fa1ae01829d8a5d68c7a35e8f7f7476eabd27/agent/agent.go#L220
and then wait for the policies from control plane