TheHive icon indicating copy to clipboard operation
TheHive copied to clipboard

[Bug] - high CPU consumption

Open andreyglauzer opened this issue 3 years ago • 18 comments

Request Type

Bug

Work Environment

Question Answer
OS version (server) Oracle Linux
OS version (client) Windows 10
Virtualized Env. False
Dedicated RAM 64 GB
vCPU 20
TheHive version / git hash 4.1.16
Package Type RPM
Database BerlkelyDB
Index type Lucene
Attachments storage Local
Browser type & version Chrome

Problem Description

Our team has a high volume of alerts, which are opened via API in TheHive, we also create several automations, to merge alerts into cases, so API searches are also constant.

We have a total of 15 analysts accessing the platform simultaneously, and at times TheHive consumes all the server's CPUs, and the platform is inaccessible, until I terminate the TheHive process via kill and start the service again.

Steps to Reproduce

I noticed that when merging, from alerts to Cases, it tends to consume a lot of server CPU, and this is something that analysts use constantly.

But I have no proof that this is really the real problem.

andreyglauzer avatar Jan 08 '22 14:01 andreyglauzer

I am having the same issue with my instance of the hive

edwardrixon avatar Jan 14 '22 11:01 edwardrixon

My org also experiences this issue. Similarly spec'd system. (14 vCPU, 64 GB)

MDB4241 avatar Jan 19 '22 14:01 MDB4241

Also here when access case or close it. Usually we got 60-70 observables per case with a total of 11k case (~100 open). I'm thinking about the features that check the related case by observable, can be an option?

backloop-biz avatar Jan 25 '22 16:01 backloop-biz

I did a job of migrating the database from BerlkelyDB to Cassandra, I'm having great results.

I'll leave but a few weeks of testing, and I'll return

andreyglauzer avatar Jan 25 '22 16:01 andreyglauzer

Also here when access case or close it. Usually we got 60-70 observables per case with a total of 11k case (~100 open). I'm thinking about the features that check the related case by observable, can be an option?

You should see a performance increase if you use the 'ignoreSimilarity' option on non-critical case artifacts. This was a useful edit for my organization and it reduces the impact of rendering a case.

I did a job of migrating the database from BerlkelyDB to Cassandra, I'm having great results.

I'll leave but a few weeks of testing, and I'll return

Glad you've found a potential resolution! Unfortunately, we are already on Cassandra :(

We've even increased resources to 40 vCPU to troubleshoot and the problem persists. We have a single host with TheHive, Cortex, Cassandra & ElasticSearch. Perhaps separating these services into dedicated hosts will yield better performance.

MDB4241 avatar Jan 25 '22 17:01 MDB4241

Also here when access case or close it. Usually we got 60-70 observables per case with a total of 11k case (~100 open). I'm thinking about the features that check the related case by observable, can be an option?

You should see a performance increase if you use the 'ignoreSimilarity' option on non-critical case artifacts. This was a useful edit for my organization and it reduces the impact of rendering a case.

I did a job of migrating the database from BerlkelyDB to Cassandra, I'm having great results. I'll leave but a few weeks of testing, and I'll return

Glad you've found a potential resolution! Unfortunately, we are already on Cassandra :(

We've even increased resources to 40 vCPU to troubleshoot and the problem persists. We have a single host with TheHive, Cortex, Cassandra & ElasticSearch. Perhaps separating these services into dedicated hosts will yield better performance.

I've noticed some analysts using "stats" which generates a time-consuming and frequently used search.

I removed this option on the frontend.

One thing I've also noticed, is large descriptions, this requires a lot from the server, I believe in the conversion. We are avoiding very long descriptions

andreyglauzer avatar Jan 25 '22 17:01 andreyglauzer

I did a job of migrating the database from BerlkelyDB to Cassandra, I'm having great results.

I'll leave but a few weeks of testing, and I'll return

Is there any docs on how perform this migration?

backloop-biz avatar Jan 26 '22 09:01 backloop-biz

Is there any docs on how perform this migration?

This migration is not possible in a massive way, I had to create a new instance and open all cases and alerts via API to the cassandra database

andreyglauzer avatar Jan 26 '22 14:01 andreyglauzer

We are having the same issue as described above. Giving more and more resources to TheHive wouldn't resolve the issue. Disabling statistics solved our issue, mainly but it is really strange.

Cyp-her avatar Aug 17 '22 11:08 Cyp-her

@andreyglauzer Hello what happen with your changes that you did to your system about thehive and cortex, what do you recommend to do?

romarito90 avatar Sep 28 '22 16:09 romarito90

I have the same issue with a VM having 16vCPU and 48GB RAM.

baonq-me avatar Nov 01 '22 10:11 baonq-me

We are having the same issue as described above. Giving more and more resources to TheHive wouldn't resolve the issue. Disabling statistics solved our issue, mainly but it is really strange.

Where can you disable the statistics for the frontend?

Taragos avatar Jan 27 '23 14:01 Taragos

Me having the same issue on a physical with 2x Xeon 4210. When I hit the button "Stats", CPU comsumtion go straight to 100% and memory consumption of Thehive reported by systemd is about ~30GB.

baonq-me avatar Feb 06 '23 10:02 baonq-me

I am having the same issue.... it will eventually use all cpus at 100% and then just stop responding. i have to kill the process to get it to work again.

hive - 8cpu 32gb ram 3x elastic - 8 cpu 32gb ram each 3x cassandra - 8 cpu 32gb ram each

bhjella-awake avatar Jan 11 '24 22:01 bhjella-awake

from observations.... Elastic gets to like 600% cpu utilization then drops then thehive gets to 300% and stays there. this happens again, elastic 600% then the hive jumps to 600% like 20 min later. then elastic again 600% and hive now maxed at 800% and everything freezes.

feels like there is some thread that doesn't timeout and just spins forever

bhjella-awake avatar Jan 12 '24 18:01 bhjella-awake

from observations.... Elastic gets to like 600% cpu utilization then drops then thehive gets to 300% and stays there. this happens again, elastic 600% then the hive jumps to 600% like 20 min later. then elastic again 600% and hive now maxed at 800% and everything freezes.

feels like there is some thread that doesn't timeout and just spins forever

There is workaround by limiting number of CPU core consumed by ElasticSearch. By default, ElasticSearch use all CPUs available.

To limit CPU cores used, add this line to elasticsearch.yaml

node.processors: 4 # allow 4 CPUs to be used.

baonq-me avatar Jan 13 '24 04:01 baonq-me

thanks, but the issue isn't that is using cpu, its there is some thread that just never ends the constantly consumes the process. I have narrowed it down to when a specific user is using it. Going to try and determine what he is doing that is causing the hive to just churn CPU.. it normally sites arround 100-200 CPU when utilized durring work day. Except for this one user.

bhjella-awake avatar Jan 15 '24 01:01 bhjella-awake

Found out the problem on my side. It was a user using the stats button on the cases page: image

I have implemented rules on my apache's reverse proxy to 401 the API requests for those:

        RewriteEngine On

        # Block query string name=case-by-tags-stats
        RewriteCond %{QUERY_STRING} name=case-by-tags-stats [NC]
        RewriteRule ^ - [R=401,L]

        # Block query string name=case-by-status-stats
        RewriteCond %{QUERY_STRING} name=case-by-status-stats [NC]
        RewriteRule ^ - [R=401,L]

        # Block query string name=case-by-resolution-status-stats
        RewriteCond %{QUERY_STRING} name=case-by-resolution-status-stats [NC]
        RewriteRule ^ - [R=401,L]

bhjella-awake avatar Jan 15 '24 18:01 bhjella-awake