Bug: Poor Postgres Performance
Description:
As discussed in the Slack thread, we are experiencing degraded performance when using the postgres database driver. The most noticeable issue is around multi-hop queries.
The example used in the thread was this but any strongly typed multi hop query seems to either hang or take a very long time.
MATCH p = (n:Base)-[:HAS_ACCESS_TO]->(:Safe)-[:Contains]->(:Secret)-[:CredentialOf]->(:Computer)
As this is using a mixture of AD and OpenGraph nodes, during debugging I was initially using the "Contains" edge which is already in-built, changing this to something custom did at the time seem to provide a bit of a performance boost but I think it may have been placebo.
Are you intending to fix this bug?
Happy to assist with any debugging / testing Postgres tuning
Component(s) Affected:
- UI
- API
- PostgreSQL
Steps to Reproduce:
- Ingest AD and OpenGraph Data
- Execute a multi-hop query such as the one above
Expected Behavior:
Graph data is returned in a reasonable amount of time.
Actual Behavior:
Data either does not return or takes a considerable amount of time.
Screenshots/Code Snippets/Sample Files:
N/A
Environment Information:
BloodHound: v8.2.0
Database: postgres:16
Additional Information:
https://bloodhoundhq.slack.com/archives/C20NG2L87/p1759060061111959?thread_ts=1759060061.111959&cid=C20NG2L87
Potential Solution (optional):
PostgreSQL tuning
Related Issues:
N/A
Contributor Checklist:
- [x] I have searched the issue tracker to ensure this bug hasn't been reported before or is not already being addressed.
- [x] I have provided clear steps to reproduce the issue.
- [x] I have included relevant environment information details.
- [x] I have attached necessary supporting documents.
- [x] I have checked that any JSON files I am attempting to upload to BloodHound are valid.
Adding onto this one, with similar long path queries we can see that a single CPU core is being utilised which appears to be our bottleneck.
I've got this flagged internally to get our PostgreSQL config recommendations published. Will make sure this issue is tagged when that's published!
Hey @StephenHinck! Apologies for the chase but has there been any progress on this? 🤞
@spyr0-sec Hey, I'm chasing this down right now. Theres 2 potential things going on here, one is DB tuning, which I'm finding our docs for so I can give you better suggestions. The second one is a potential issue with cypher translation which we have slated on our backlog.
At the very least I'll provide the tuning suggestions to see if that helps when I get them
So going to just post a bunch of the tuning parameters we have marked as our recommendations assuming you have the CPU cores to handle it:
max_parallel_workers - Bump this up to 16
max_parallel_maintenance_workers - Bump this up to 4
work_mem - 64-128 MiB
default_statistics_target (important one!) - 2000-3500. Start on the lower end and work up to see where performance gets better for you
random_page_cost - If you're on an SSD, set this to 1
Hopefully this will be a decent starting point for better performance
Appreciate this @rvazarkar! I've handed this to our databases team and will report back