velociraptor Watchdog Capability

trafficstars

Hi, It will be nice to have a watch dog with the installation of a Velociraptor. So, if the service does not respond or connect to the Velociraptor server the watch dog will restart the service.

Thanks in advanced, Gal Miller

Sep 04 '22 06:09 mgdevsoft

There is a nanny in the client that kills the process if it uses too much memory or fails to connect to the server (for example if due to a bug the comms loop exits or gets stuck)

Normally restarting the service is left to the service manager on windows and systemd on Linux. They implement suitable backoff and logging to make sure this is done safely so we don't really need to duplicate them.

When the nanny kills the agent it relies on the service manager to restart itself.

Did you observe cases where this did not work? We saw once a case where the comms loop got stuck but the service manager was happy that the service is ok and so did not restart it. This is why we introduced the nanny service.

Sep 04 '22 07:09 scudette

Hi, Yes, It usually happens when you run an intensive hunt for a KAPE collection. In some other cases running hunts on events logs. The last time I tested it was on version 0.6.4. The client just got stuck. So, it did not respond to any new hunts or cancellation of the hunt. In addition, no data was getting to the server. The best way to describe it will be like the client just froze. Only a manual restart for the service fixed the problem.

My point here is that we want to minimize the interaction with the support staff like IT and other related people. Since, there are differences in time zones and we want the ability to not bother them with this task as they have other things to do.

In most of our cases we have only 1 man in the company that handles the IT and sometimes it is a 3rd party company.

On Sun, Sep 4, 2022 at 10:40 AM Mike Cohen @.***> wrote:

There is a nanny in the client that kills the process if it uses too much memory or fails to connect to the server (for example if due to a bug the comms loop exits or gets stuck)

Normally restarting the service is left to the service manager on windows and systemd on Linux. They implement suitable backoff and logging to make sure this is done safely so we don't really need to duplicate them.

When the nanny kills the agent it relies on the service manager to restart itself.

Did you observe cases where this did not work? We saw once a case where the comms loop got stuck but the service manager was happy that the service is ok and so did not restart it. This is why we introduced the nanny service.

— Reply to this email directly, view it on GitHub https://github.com/Velocidex/velociraptor/issues/2044#issuecomment-1236280180, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGFJ5GC5XLHXXKEQJTU633V4RG5FANCNFSM6AAAAAAQEGMXAU . You are receiving this because you authored the thread.Message ID: @.***>

Sep 04 '22 08:09 mgdevsoft

Sometimes when you do a heavy hunt that transfers a lot of data it takes a while for the query to run. The client tries to limit the impact on the endpoint by only running 2 queries at the same time. So as the queries are running you can not run other queries on the client. This might feel like it is not responsive but it is still sending data.

Some queries are marked as "urgent" this forces them to jump the queue and run on the client anyway - for example VFS refresh queries are urgent - also shell commands are urgent. You can manually collect things as urgent (https://docs.velociraptor.app/vql_reference/server/collect_client/ ) if you like.

You can kill the client remotely at any time using the killkillkill() VQL action (https://docs.velociraptor.app/vql_reference/basic/killkillkill/ ) . This similarly relies on the fact it will be restarted by service manager.

Sep 04 '22 09:09 scudette

Great, that is a solution. But from our experience even the shell commands did not work. It just got hung for days and only a restart for the service fixed it.

On Sun, Sep 4, 2022 at 12:10 PM Mike Cohen @.***> wrote:

Sometimes when you do a heavy hunt that transfers a lot of data it takes a while for the query to run. The client tries to limit the impact on the endpoint by only running 2 queries at the same time. So as the queries are running you can not run other queries on the client. This might feel like it is not responsive but it is still sending data.

Some queries are marked as "urgent" this forces them to jump the queue and run on the client anyway - for example VFS refresh queries are urgent - also shell commands are urgent. You can manually collect things as urgent ( https://docs.velociraptor.app/vql_reference/server/collect_client/ ) if you like.

You can kill the client remotely at any time using the killkillkill() VQL action (https://docs.velociraptor.app/vql_reference/basic/killkillkill/ ) . This similarly relies on the fact it will be restarted by service manager.

— Reply to this email directly, view it on GitHub https://github.com/Velocidex/velociraptor/issues/2044#issuecomment-1236294593, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGFJ5CXMF4PUSIL3UJBRALV4RRRZANCNFSM6AAAAAAQEGMXAU . You are receiving this because you authored the thread.Message ID: @.***>

Sep 04 '22 10:09 mgdevsoft

Were you able to kill the client remotely with killkillkill() ?

If this happens again it would be useful to try to get a profile if you can easily reproduce it. unfortunately if it is not reachable from the server you will need to run it locally with --debug flag and get a local profile which is not easy.

Sep 04 '22 12:09 scudette

If this is still an issue for 0.6.7-4 please reopen

Dec 22 '22 23:12 scudette

Nope, the changes made to the service configuration solved the issue.

On Fri, Dec 23, 2022, 01:14 Mike Cohen @.***> wrote:

If this is still an issue for 0.6.7-4 please reopen

— Reply to this email directly, view it on GitHub https://github.com/Velocidex/velociraptor/issues/2044#issuecomment-1363429089, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGFJ5ANAXNQXL5WP22N5P3WOTOFVANCNFSM6AAAAAAQEGMXAU . You are receiving this because you authored the thread.Message ID: @.***>

Dec 23 '22 08:12 mgdevsoft

velociraptor velociraptor copied to clipboard

Watchdog Capability

velociraptor
velociraptor copied to clipboard