TheHive
TheHive copied to clipboard
[Bug] - high CPU consumption
Request Type
Bug
Work Environment
Question | Answer |
---|---|
OS version (server) | Oracle Linux |
OS version (client) | Windows 10 |
Virtualized Env. | False |
Dedicated RAM | 64 GB |
vCPU | 20 |
TheHive version / git hash | 4.1.16 |
Package Type | RPM |
Database | BerlkelyDB |
Index type | Lucene |
Attachments storage | Local |
Browser type & version | Chrome |
Problem Description
Our team has a high volume of alerts, which are opened via API in TheHive, we also create several automations, to merge alerts into cases, so API searches are also constant.
We have a total of 15 analysts accessing the platform simultaneously, and at times TheHive consumes all the server's CPUs, and the platform is inaccessible, until I terminate the TheHive process via kill and start the service again.
Steps to Reproduce
I noticed that when merging, from alerts to Cases, it tends to consume a lot of server CPU, and this is something that analysts use constantly.
But I have no proof that this is really the real problem.
I am having the same issue with my instance of the hive
My org also experiences this issue. Similarly spec'd system. (14 vCPU, 64 GB)
Also here when access case or close it. Usually we got 60-70 observables per case with a total of 11k case (~100 open). I'm thinking about the features that check the related case by observable, can be an option?
I did a job of migrating the database from BerlkelyDB to Cassandra, I'm having great results.
I'll leave but a few weeks of testing, and I'll return
Also here when access case or close it. Usually we got 60-70 observables per case with a total of 11k case (~100 open). I'm thinking about the features that check the related case by observable, can be an option?
You should see a performance increase if you use the 'ignoreSimilarity' option on non-critical case artifacts. This was a useful edit for my organization and it reduces the impact of rendering a case.
I did a job of migrating the database from BerlkelyDB to Cassandra, I'm having great results.
I'll leave but a few weeks of testing, and I'll return
Glad you've found a potential resolution! Unfortunately, we are already on Cassandra :(
We've even increased resources to 40 vCPU to troubleshoot and the problem persists. We have a single host with TheHive, Cortex, Cassandra & ElasticSearch. Perhaps separating these services into dedicated hosts will yield better performance.
Also here when access case or close it. Usually we got 60-70 observables per case with a total of 11k case (~100 open). I'm thinking about the features that check the related case by observable, can be an option?
You should see a performance increase if you use the 'ignoreSimilarity' option on non-critical case artifacts. This was a useful edit for my organization and it reduces the impact of rendering a case.
I did a job of migrating the database from BerlkelyDB to Cassandra, I'm having great results. I'll leave but a few weeks of testing, and I'll return
Glad you've found a potential resolution! Unfortunately, we are already on Cassandra :(
We've even increased resources to 40 vCPU to troubleshoot and the problem persists. We have a single host with TheHive, Cortex, Cassandra & ElasticSearch. Perhaps separating these services into dedicated hosts will yield better performance.
I've noticed some analysts using "stats" which generates a time-consuming and frequently used search.
I removed this option on the frontend.
One thing I've also noticed, is large descriptions, this requires a lot from the server, I believe in the conversion. We are avoiding very long descriptions
I did a job of migrating the database from BerlkelyDB to Cassandra, I'm having great results.
I'll leave but a few weeks of testing, and I'll return
Is there any docs on how perform this migration?
Is there any docs on how perform this migration?
This migration is not possible in a massive way, I had to create a new instance and open all cases and alerts via API to the cassandra database
We are having the same issue as described above. Giving more and more resources to TheHive wouldn't resolve the issue. Disabling statistics solved our issue, mainly but it is really strange.
@andreyglauzer Hello what happen with your changes that you did to your system about thehive and cortex, what do you recommend to do?
I have the same issue with a VM having 16vCPU and 48GB RAM.
We are having the same issue as described above. Giving more and more resources to TheHive wouldn't resolve the issue. Disabling statistics solved our issue, mainly but it is really strange.
Where can you disable the statistics for the frontend?
Me having the same issue on a physical with 2x Xeon 4210. When I hit the button "Stats", CPU comsumtion go straight to 100% and memory consumption of Thehive reported by systemd is about ~30GB.
I am having the same issue.... it will eventually use all cpus at 100% and then just stop responding. i have to kill the process to get it to work again.
hive - 8cpu 32gb ram 3x elastic - 8 cpu 32gb ram each 3x cassandra - 8 cpu 32gb ram each
from observations.... Elastic gets to like 600% cpu utilization then drops then thehive gets to 300% and stays there. this happens again, elastic 600% then the hive jumps to 600% like 20 min later. then elastic again 600% and hive now maxed at 800% and everything freezes.
feels like there is some thread that doesn't timeout and just spins forever
from observations.... Elastic gets to like 600% cpu utilization then drops then thehive gets to 300% and stays there. this happens again, elastic 600% then the hive jumps to 600% like 20 min later. then elastic again 600% and hive now maxed at 800% and everything freezes.
feels like there is some thread that doesn't timeout and just spins forever
There is workaround by limiting number of CPU core consumed by ElasticSearch. By default, ElasticSearch use all CPUs available.
To limit CPU cores used, add this line to elasticsearch.yaml
node.processors: 4 # allow 4 CPUs to be used.
thanks, but the issue isn't that is using cpu, its there is some thread that just never ends the constantly consumes the process. I have narrowed it down to when a specific user is using it. Going to try and determine what he is doing that is causing the hive to just churn CPU.. it normally sites arround 100-200 CPU when utilized durring work day. Except for this one user.
Found out the problem on my side. It was a user using the stats
button on the cases page:
I have implemented rules on my apache's reverse proxy to 401 the API requests for those:
RewriteEngine On
# Block query string name=case-by-tags-stats
RewriteCond %{QUERY_STRING} name=case-by-tags-stats [NC]
RewriteRule ^ - [R=401,L]
# Block query string name=case-by-status-stats
RewriteCond %{QUERY_STRING} name=case-by-status-stats [NC]
RewriteRule ^ - [R=401,L]
# Block query string name=case-by-resolution-status-stats
RewriteCond %{QUERY_STRING} name=case-by-resolution-status-stats [NC]
RewriteRule ^ - [R=401,L]