ADExplorerSnapshot.py Implement LDIF Output Mode

This pull request adds a new LDIF output mode which converts the AD snapshot into (mostly) equivalent ldapsearch output for further processing with the latest BOFHound version.

The intention is to use LDIF as a common input source for BOFHound where all BloodHound related parsing improvements can be shared in a central place, independent of the tool used to gather the information (AD Explorer, ldapsearch BOF, pyldapsearch, ADWS, ...).

This should implement #23 and indirectly "fix" issues like #21, #52 and others.

For testing I did a (objectGUID=*) query over all relevant naming contexts:

LDAP_BASE="DC=ludus,DC=domain"

for dn in "$LDAP_BASE" "CN=Configuration,$LDAP_BASE" "CN=Schema,CN=Configuration,$LDAP_BASE"; do
        pyldapsearch ludus.domain/domainadmin:password "(objectGUID=*)" -base-dn "$dn"  -output "pyldapsearch.$dn.log" -ldaps
done

Combined all results into one:

$ cat pyldapsearch*.log > combined.log

Sorted the objects in the output by their respective distinguishedName using the following script:

import sys
import re

def read_objects(filename):
    with open(filename, 'r', encoding='utf-8') as f:
        lines = f.readlines()

    separator = None
    objects = []
    current_object = []

    for line in lines:
        if re.match(r'^\s*[-]{2,}\s*$', line):
            if separator is None:
                separator = line.rstrip('\n')  # capture the first separator
            if current_object:
                objects.append("".join(current_object).strip())
                current_object = []
        else:
            current_object.append(line)

    if current_object:
        objects.append("".join(current_object).strip())

    return separator, objects

def get_distinguished_name(object):
    match = re.search(r'^distinguishedName:\s*(.+)', object, re.MULTILINE)
    return match.group(1) if match else ""

def main():
    if len(sys.argv) != 2:
        print("Usage: python normalize.py <filename>")
        sys.exit(1)

    filename = sys.argv[1]
    separator, objects = read_objects(filename)

    sorted_objects = sorted(objects, key=get_distinguished_name)

    for object in sorted_objects:
        print(separator)
        print(object)

if __name__ == "__main__":
    main()

Like this:

$ python3 normalize.py combined.log > pyldapsearch.log

Converted the AD snapshot to LDIF:

$ ADExplorerSnapshot.py -m LDIF ludus.dat
[*] Server: DC01.ludus.domain
[*] Time of snapshot: 2025-04-24T09:11:48
[*] Mapping offset: 0x2a127a
[*] Object count: 3742
[+] Parsing properties: 1499
[+] Parsing classes: 269
[+] Parsing object offsets: 3742
[+] Collecting data: dumped 3742 objects
[+] Output written to DC01.ludus.domain_1745478708_objects.ldif

Sorted this output too:

$ python3 normalize.py DC01.ludus.domain_1745478708_objects.ldif > adexplorer.log

Compared the results:

$ diff -u adexplorer.log pyldapsearch.log

Filtered for changes present in the LDAP query ouput, but not in the snapshot:

$ diff -u adexplorer.log pyldapsearch.log | grep '^\+' -B2 -A2 | less

Observed that the differences are mostly due to changes happening between taking the snapshot and performing the LDAP queries.

Extracted all field names which differ in the LDAP results:

$ diff -u adexplorer.log pyldapsearch.log | grep -a '^\+' | cut -d: -f1 | sort -u
+accountExpires
+creationTime
+dnsRecord
+dSCorePropagationData
+lastLogon
+lastLogonTimestamp
+lastSetTime
+logonCount
+msDS-HasInstantiatedNCs
+otherWellKnownObjects
+priorSetTime
+pwdLastSet
+rIDAllocationPool
+rIDAvailablePool
+rIDPreviousAllocationPool
+uSNChanged
+wellKnownObjects
+whenChanged

Concluded with manual analysis that this mostly expected and should be good enough for further processing with BOFHound.

Parsed the AD explorer originated LDIF output file with BOFHound:

$ bofhound -i adexplorer.log -o adexplorer

 _____________________________ __    __    ______    __    __   __   __   _______
|   _   /  /  __   / |   ____/|  |  |  |  /  __  \  |  |  |  | |  \ |  | |       \
|  |_)  | |  |  |  | |  |__   |  |__|  | |  |  |  | |  |  |  | |   \|  | |  .--.  |
|   _  <  |  |  |  | |   __|  |   __   | |  |  |  | |  |  |  | |  . `  | |  |  |  |
|  |_)  | |  `--'  | |  |     |  |  |  | |  `--'  | |  `--'  | |  |\   | |  '--'  |
|______/   \______/  |__|     |__|  |___\_\________\_\________\|__| \___\|_________\

                            << @coffeegist | @Tw1sm >>

[10:48:53] INFO     Parsed 3741 LDAP objects from 1 log files
[10:48:53] INFO     Parsed 0 local group/session objects from 1 log files
[10:48:53] INFO     Sorting parsed objects by type...
[10:48:53] INFO     Parsed 17 Users
[10:48:53] INFO     Parsed 56 Groups
[10:48:53] INFO     Parsed 3 Computers
[10:48:53] INFO     Parsed 1 Domains
[10:48:53] INFO     Parsed 0 Trust Accounts
[10:48:53] INFO     Parsed 2 OUs
[10:48:53] INFO     Parsed 211 Containers
[10:48:53] INFO     Parsed 6 GPOs
[10:48:53] INFO     Parsed 1 Enterprise CAs
[10:48:53] INFO     Parsed 1 AIA CAs
[10:48:53] INFO     Parsed 1 Root CAs
[10:48:53] INFO     Parsed 1 NTAuth Stores
[10:48:53] INFO     Parsed 5 Issuance Policies
[10:48:53] INFO     Parsed 41 Cert Templates
[10:48:53] INFO     Parsed 1768 Schemas
[10:48:53] INFO     Parsed 1 Referrals
[10:48:53] INFO     Parsed 1503 Unknown Objects
[10:48:53] INFO     Parsed 0 Sessions
[10:48:53] INFO     Parsed 0 Privileged Sessions
[10:48:53] INFO     Parsed 0 Registry Sessions
[10:48:53] INFO     Parsed 0 Local Group Memberships
[10:48:53] INFO     Parsed 2340 ACL relationships
[10:48:53] INFO     Created default users
[10:48:53] INFO     Created default groups
[10:48:53] INFO     Resolved group memberships
[10:48:53] INFO     Resolved delegation relationships
[10:48:53] INFO     Resolved OU memberships
[10:48:53] INFO     Linked GPOs to OUs
[10:48:53] INFO     Built CA certificate chains
[10:48:53] INFO     Resolved enabled templates per CA
[10:48:53] INFO     JSON files written to adexplorer

Compared to the built-in BloodHound output mode you get valid ADCS, GPO, OU and container objects for free.

May 01 '25 05:05 martanne

Thanks for implementing this! Is this a fully-compatible LDIF format or does bofhound have its own format? (E.g. I'm not sure the normal LDIF format has dashes in between lines: https://github.com/c3c/ADExplorerSnapshot.py/pull/69/files#diff-921044e0048a35f62ca97595f22e7cbdd6c1ce05b481b618f024c4bdac1cf32fR197)

May 02 '25 09:05 c3c

Thanks for taking a look.

You are correct, it is not standard compliant according to RFC 2849. Therefore, calling it LDIF is not entirely correct. My goal was basically to get improved BloodHound output with minimal effort. As outlined above, I mostly compared it to pyldapsearch output.

The main differences compared to the standard seem to be:

Records must be separated by a blank line (not dashes)
Records must start with a dn: attribute
Base64 encoded attribute values must be separated by two colons i.e. name:: dmFsdWU=
Each value of a multi value attribute must be printed on its own line, currently they are comma separated i.e.

objectClass: top
objectClass: domain

instead of:

objectClass: top, domain

String attributes containing new lines are not handled correctly

The simplest way to get standard compliant output, would probably be to base64 encode all attribute values. This is allowed by the standard, but is obviously not that human readable and would require some BOFHound changes.

May 02 '25 11:05 martanne

Thanks for explaining! I think it would be better then to modify the PR for it to say output mode "BOFHound" ?

May 02 '25 12:05 c3c

That is an option, yes. Another (preferred one?) is to make it actually RFC compliant, I gave it a try in this branch. Haven't yet looked at the changes required on the BOFHound side though.

May 04 '25 07:05 martanne

Sure, could you PR it? Alternatively, I'm also happy to merge in the two different modes.

May 06 '25 08:05 c3c

I've been testing the LDIF (non-RFC) branch against some setups and results are looking good. I think it makes sense to makes this the new default and keep the "direct bloodhound output" as legacy format, while updating the docs to point to bofhound instead for the second conversion.

Side note: some minor conversion issues while running bofhound for loading cert template data + domain data - I'm not sure this is an issue with how we're outputting the data vs how bofhound is loading it.

Nov 07 '25 10:11 c3c