mercury icon indicating copy to clipboard operation
mercury copied to clipboard

Extracting payload hash from network traffic

Open arunppsg opened this issue 3 years ago • 6 comments

I was wondering whether it would be possible to extract payload or the payload hash from network traffic along with the fingerprints using mercury. Are there any options for it? We can do it with tcpdump but it does not give fingerprints. Any pointers will be helpful. Thanks.

arunppsg avatar May 12 '21 13:05 arunppsg

Hi Arun, we did experiment with hashing of the TCP or UDP Data field of a packet as a way to detect retransmissions and duplicated packets. In some branch or another, I think there is is code to print out the data field as a hex number. Is this the sort of thing you had in mind? Thx!

davidmcgrew avatar May 12 '21 19:05 davidmcgrew

Exactly, that was what I was looking for. If I could get the data field, then I could compute hash of it - in my case, a sha256 hash of the payload will suffix.

arunppsg avatar May 13 '21 05:05 arunppsg

Since there is no need for cryptographic collision resistance, and there is a need for speed, I had used the xxhash library https://github.com/Cyan4973/xxHash. It performed quite well in tests. I can't find the code that I had experimented with; I think it was never committed into the git repo. It added a new JSON element that holds the xxhash of the entire TCP data field of packet, something like this:

{"tcp":{"data_hash":"474554202f20485454502"}, "src_ip":"192.168.113.237", "dst_ip":"35.224.99.156", "protocol":6, "src_port":53560, "dst_port":80, "event_start":1565200503.658237}

The hash provides a practical way to detect duplicated packets, which seem to happen all the time in network capture environments, by detecting duplicate data_hash values in whatever JSON processing is being done. I think the data_hash output could be a useful aid in debugging network capture systems, especially ones with multiple capture interfaces. However, what I'd personally find more useful would be a mercury option that detected duplicate packets and ignored them (by only processing and reporting on the first packet, and ignoring any following ones). Does that line up with your thinking, or do you have some other use cases in mind?

Thanks!

davidmcgrew avatar May 13 '21 13:05 davidmcgrew

Yes, that is my requirement - to detect duplicate packet based on the payload hash value. One reason for using mercury is that it is able to handle high amount of traffic. Is there any way I could help or contribute to integrate that feature in mercury?

Thanks!

arunppsg avatar May 14 '21 10:05 arunppsg

Thanks for the offer to help. I have a bunch of other changes in progress. After those are done, how about I add a hash-based deduplicator as a compile-time option, and you can build it with that option and test it out in your environment.

davidmcgrew avatar May 25 '21 20:05 davidmcgrew

Sure, that will be great. Thanks for your help. In the meantime, I will also work on it.

arunppsg avatar May 26 '21 06:05 arunppsg