AzureHound icon indicating copy to clipboard operation
AzureHound copied to clipboard

Redundant and exsessive data in JSON output file when using "group-members" flag

Open TasteOfSpaghetti opened this issue 2 years ago • 5 comments

Hi,

When running AzureHound with the "group-members" flag, it appears that the JSON file contains a whole lot of irrelevant data. This is an issue in large environments. The size of the JSON file can grow to multiple gigabytes, which then cannot be ingested into BHCE due to the size limit (around 400MB from my testing). Using Chophound to cut the file into smaller pieces might do the trick, but even some single group nodes within the JSON are above 400MB which is above the BHCE upload limit for a given file. This results in data ingestion not being possible.

Looking through the JSON file, it appears that attributes on each of the group members like:

assignedLicenses assignedPlans provisionedPlans

Take up a large portion of the file.

Additionally, attributes like:

country department faxNumber

And a whole lot more, is present in the data for each group member.

I do not see why this data is part of the "group-members" ingestion.

I would think that 95% of the data collected can be removed.

In my mind, only the raw membership data should be included, as in:

groupId (Group ID) memberId (ID of the groups or users that are members of said group)

All the other data for the groups and users themselves, should not be included in this data collection type.

TasteOfSpaghetti avatar Dec 13 '23 10:12 TasteOfSpaghetti

Confirmed. I see the same thing for AZGroupOwner and AZAppOwner. So it might be a generic thing where there is a potential to remove redundant data.

JonasBK avatar Dec 13 '23 10:12 JonasBK

It would be great if the majority of the data can be removed. I bet it would both benefit performance when running AzureHound and also ingestion into BHCE due to smaller file sizes.

Also, I don't know this for sure, but maybe a large portion of the data isn't even ingested into BHCE and used for anything. It might just be collected and never used.

TasteOfSpaghetti avatar Dec 13 '23 10:12 TasteOfSpaghetti

I agree with @TasteOfSpaghetti and I've created pull request for changing this in the past. BloodHound team then came with #67 which should've fixed this. I didn't manage to test it before #64 was closed but now I can see #67 did not help and #64 still makes sense. BloodHound team, could you reconsider merging #64 or implementing something along those lines?

malacupa avatar Feb 19 '24 20:02 malacupa

I've asked the team to re-review #64 after this recent change.

StephenHinck avatar Feb 22 '24 00:02 StephenHinck

Until the fix has been implemented:

I've written a small script to trim the AzGroupMembers down to just id's instead of all extended properties - https://github.com/egilas/AzureHoundTrimmer/

3.3GB Azurehound output --> 350 MB.

egilas avatar Jul 02 '24 14:07 egilas