orb icon indicating copy to clipboard operation
orb copied to clipboard

Agent Reset should also reset Orb agent and not just backends

Open rboucher-me opened this issue 2 years ago • 0 comments

When requesting an agent reset, the Orb agent should request a fresh set of policies from the Orb control plane as part of the reset and before resetting and refreshing the backends.

A bug has been observed where a Dataset was deleted but the Orb agent still maintained that information and pushed it to pktvisor, even after an agent reset (observed on NS1 Orb agent 0.17.0-develop-77c9dd4 with ns1-pktvisor 4.2.0-develop-ec9d0ed).

rboucher-me avatar Jul 19 '22 20:07 rboucher-me

I also observed it when a policy subscribed to a group was only applied to the agent after I restart the container via docker.. the reset was not enough to reset configurations.. fwii the agent was online and no error was displayed (was a healthy agent).

manrodrigues avatar Aug 12 '22 16:08 manrodrigues

PR #1664

lpegoraro avatar Aug 15 '22 21:08 lpegoraro

@lpegoraro I tried with new agent:

Agent Version: 0.19.0-develop-ec8fc03 Backend Version: 4.2.0-develop-df47d63

And i'm still facing a reset behavior issue. The problem was noticed after sending multiple reset requests.

On this first agent the group loses connection with the agent and, even so, the policy remains linked. In this case, I waited a while and tried to reset again and, even so, the expected behavior of the agent was not restored (only when I restarted the container did everything go back to how it should be)

agent_reset_not_working

Full logs:

_orb-agent-int-testXqjde_logs.txt

On this second agent: first the policy failed and the group lost connection with the agent.. then the policy was online.. but the group was still lost.. after a while I reset the agent and besides the group did not come back, the policy was with status unknown

image

agent_reset_break

policy_unknown

Full logs:

_orb-agent-int-testtHlei_logs.txt

manrodrigues avatar Aug 17 '22 12:08 manrodrigues

Steps to reproduce:

  • Apply this policy to an agent (this policy will fail because exclude_noerror exists but s not a boolean).
{
  "handlers": {
    "modules": {
      "handler_dns_1": {
        "filter": {
          "exclude_noerror": "",
          "only_qname_suffix": [
            ""
          ]
        },
        "type": "dns"
      }
    }
  },
  "input": {
    "input_type": "pcap",
    "tap": "default_pcap"
  },
  "kind": "collection"
}
  • Remove (delete) the policy from Orb (it can be done through the API or the orb UI)
  • Check agent view page or agent backends (related with #1342)
  • Reset the agent remotely

Expected result:

  • The policy must be unattached from agent last heartbeat because the policy does not exist anymore.

Current result:

  • Policy keeps attached

Full logs:

_affectionate_jones_logs(1).txt

manrodrigues avatar Aug 29 '22 13:08 manrodrigues

@lpegoraro For this problem where the policy with error remains attached to the agent even after restart: I think the issue is that we reapply all the policies of policy repo on agent, and this happens before the agent restart process finishes which will fetch all policies we need to apply anyway. We can delete all policies from policy repo here changing to true https://github.com/ns1labs/orb/blob/d21fa1ae01829d8a5d68c7a35e8f7f7476eabd27/agent/agent.go#L212

and remove this block https://github.com/ns1labs/orb/blob/d21fa1ae01829d8a5d68c7a35e8f7f7476eabd27/agent/agent.go#L220

and then wait for the policies from control plane

mclcavalcante avatar Aug 29 '22 20:08 mclcavalcante