beagle icon indicating copy to clipboard operation
beagle copied to clipboard

Speed up EVTX Parsing

Open yampelo opened this issue 6 years ago • 3 comments

Move over to https://github.com/omerbenamram/pyevtx-rs

yampelo avatar May 29 '19 17:05 yampelo

@yampelo let me know if you need a hand with this :)

omerbenamram avatar Jun 02 '19 05:06 omerbenamram

@omerbenamram It's mainly a question of do i change the output of your tool to match what i was working off of before, or do i change all the functions to match the output of your tool. For example:

proc = SysMonProc(
            host=event["Computer"],
            user=event["EventData_User"],
            process_guid=event["EventData_ProcessGuid"],
            process_id=int(event["EventData_ProcessId"]),
            process_image=process_image,
            process_image_path=process_path,
        )
        proc_file = proc.get_file_node()
        proc_file.file_of[proc]

        dest_addr = IPAddress(ip_address=event["EventData_DestinationIp"])

        proc.connected_to[dest_addr].append(
            timestamp=event["EventData_UtcTime"],
            port=event["EventData_DestinationPort"],
            protocol=event["EventData_Protocol"],
        )

        if event.get("EventData_DestinationHostname"):
            hostname = Domain(event["EventData_DestinationHostname"])
            hostname.resolves_to[dest_addr].append(timestamp=event["EventData_UtcTime"])
            return (proc, proc_file, dest_addr, hostname)

        return (proc, proc_file, dest_addr)

Works off of this: https://github.com/yampelo/beagle/blob/master/beagle/datasources/win_evtx.py#L58

yampelo avatar Jun 02 '19 15:06 yampelo

@yampelo The nice thing is that my package already produces valid JSON in rust, so most of the code that is currently here https://github.com/yampelo/beagle/blob/master/beagle/datasources/win_evtx.py#L78 will go away (replaced with json.loads).

As for these snippets - to be compatible with my output, it's merely changing event["EventData_UtcTime"] to event["EventData"]["UtcTime"] (which is the way they are actually represented in the event), but you could also adapt the json output to be flat to match the current code, I think the former option is slightly nicer but both should do the trick.

You could use a snippet that flattens the data (eg https://stackoverflow.com/questions/6027558/flatten-nested-dictionaries-compressing-keys) to basically make this drop in.

So it's really up to you :) But if I could help in any ways id be willing to see this go through, you'd be very surprised with the performance difference if you haven't tried this already.

omerbenamram avatar Jun 02 '19 16:06 omerbenamram