tcpdump icon indicating copy to clipboard operation
tcpdump copied to clipboard

Packet sanitization and IP masking

Open lilchurro opened this issue 8 years ago • 11 comments

Created a separate branch so I can keep upstream and workgroup README and version strings separate.

lilchurro avatar Jul 18 '17 21:07 lilchurro

As a general note, this problem is not new and similar tools already exist:

  • http://scrub-tcpdump.sourceforge.net/docs.php
  • https://www.wireshark.org/lists/wireshark-users/201201/msg00106.html
  • https://blog.packet-foo.com/2013/07/trace-file-sanitization-for-network-analysts/
  • http://www.tm.uka.de/software/pktanon/links/index.html
  • https://github.com/thepacketgeek/sanicap

If you still find it better to suggest a new solution, the proposed changes need to be one clean commit, which explains why this specific solution is better.

infrastation avatar Jul 19 '17 10:07 infrastation

Hi there!

I'm currently working on the +#include <arpa/inet.h> issue, and a better description for the PR request, but I'd like to take a moment to respond to the question of "Why does this PR exist?"

As a former network analyst, I'm certainly aware that tools exist that sort of do these anonymization/privatization and sanitization features, but there are (in my experience) some not-insignificant issues with their adoption. Our motivation for ultimately adding this to tcpdump are the following:

  1. Simplicity and automation. We are trying to automate the collection and analysis of network traffic coming off of SDN-enabled switches, and we wanted something that did bulk sanitization without requiring multistep or manual processes (i.e., first collecting, then moving, then scrubbing, etc.) because...

  2. Ability to scale. Most of our work entail large packet captures (e.g., hundreds of GBs), which can pose scalability issues when running subsequent tools, especially if they require writing full packet captures to memory before sanitizing and anonymizing. We had looked at scapy, for example (which is a backbone to some of these tools), which does not seem to handle scaling challenges particularly well.

  3. Thorough sanitization. In the discussions referenced, one will see comments about sanitizing higher layer protocols (e.g., DNS) but potentially omitting or mishandling another (e.g., HTTP). As well, a problem in one layer can undo the anonymity offered by scrubbing a lower layer protocol. This approach may not solve every privacy concern, but completely zeroing out (or truncating) all payload data above TCP/UDP seems to pretty much resolve many of the more conventional privacy issues.

  4. Runs cleanly on Linux. We require something that can be run on Linux as that's where we do the bulk of our work. We are also finding that Linux is typically the base OS on "bare metal" switches / "disaggregated" network gear. Windows-only based approaches (like TraceWrangler) are challenging for those of us running tcpdump on primarily Linux-based environments or on physical network switches.

  5. Is an active project. Ideally, we'd like something that is maintained/not orphaned, and is stable. For example, SCRUBtcpdump looks promising, but doesn't seem to have been updated in a decade, and segfaults from the command line with unexpected input. At the risk of stating the obvious, tcpdump's long history and widescale adoption, significant userbase, active community and development base makes it an ideal and attractive method.

Finally, there's the utility of the proposed -00 option which shrinks the size of data collection. This reduction has significant impact for folks doing downstream ingest and analytics of really large network collections. (Sending 1GB to a GPU farm is way better than 10GB!) Having this functionality built into tcpdump opens it up as a really great option not just for traditional networking folks, but for people doing research in machine learning (e.g., our team) in computer networks and security.

All that being said, thanks so much for your input and guidance! I'll work on better defining the description and the code issues pointed out.

lilchurro avatar Jul 19 '17 18:07 lilchurro

lilchurro [email protected] wrote: > 2 Ability to scale. Most of our work entail large packet captures

> 3 Thorough sanitization. In the discussions referenced, one will see

So, if you are saying that you would typically want to run the sanitization as you capture, then I get why you want it in tcpdump. That way there is never a file that is non-sanitized.

mcr avatar Jul 20 '17 13:07 mcr

Guy Harris [email protected] wrote: > which features; as I remember, Visual Studio 2013 (the compiler > currently used on the Wireshark buildbots) support all of those.

... If this compiler is redily available, then I think that covers windows.

My only concern is whether or not someone is still trying to build for some old HPUX or something like that. I'm happy to abandon them :-)

-- ] Never tell me the odds! | ipv6 mesh networks [ ] Michael Richardson, Sandelman Software Works | network architect [ ] [email protected] http://www.sandelman.ca/ | ruby on rails [

mcr avatar Jul 24 '17 16:07 mcr

Hello again... I'm having some issues compiling on Windows. (Using Visual Studio 2017, and attempting to compile with headers and libraries as provided by WpcapSrc_4_1_3.zip.)

I'm trying to compile just the master branch of tcpdump (i.e., without my changes), but am stuck with some dependency issues (among other things). It appears to me that windump version 4.10 doesn't compile on Windows as it refers to the file util.c, which doesn't exist. Also, I'm having issues using wpcapsrc 4.1.3, as the pcap.h included therein contains reference to sys/time.h -- so is there a specific version of winpcap that I should be using? The win32 readme seems vague about this.

Is there some way I can get assistance in compiling the master branch on Windows? It seems like I should get that working before trying to verify that my changes can compile.

lilchurro avatar Jul 26 '17 23:07 lilchurro

FYI, I've added new tests to verify the anonymization and packet sanitization for TCP/UDP packet payloads, and have integrated them into TESTrun.sh. I noticed that there wasn't really a way to test for cases where -w [savefile] differs from the usual tcpdump output, so the new test file TESToutfile checks for such a case. For ease of use, it's basically coded in the same structure as TESTonce and uses TESTOUTLIST to keep track of tests related to savefiles.

If there's something else I can do to move this PR along, please let me know; otherwise, I will consider this about as done as it could be. 🙂

lilchurro avatar Sep 28 '17 14:09 lilchurro

Ping... is there anything else I can do to move this PR along?

lilchurro avatar Oct 11 '17 15:10 lilchurro

😱 Well, that rebase was a harrowing experience. 1 clean commit now tho.

lilchurro avatar Dec 12 '17 21:12 lilchurro

Is this something @lilchurro should consider spending the time to bring up to date? Would be great to get some guidance here!

gregs5 avatar Aug 01 '18 16:08 gregs5

There were two discussions made in this pull request. One about the code itself, which indeed belongs here. I cannot help completing it before I finish other bits of work, other developers may be available. Another discussion, which would belong more to the tcpdump-workers mailing list, was whether it is right to incorporate the masking function into tcpdump as opposed to doing it in a separate binary. I guess the latter one could go into more detail to reach a more pronounced consensus. I, for instance, am not quite convinced yet, but I could be wrong.

infrastation avatar Aug 01 '18 20:08 infrastation

Thanks - really appreciate the reply. Understood on your concerns. I'll share that we looked at other methods (and all the tools you referenced last year) and still came to the conclusion that for anyone who wants this functionality, this is the right path. That said though, I'll start a thread in the tcpdump-workers and we'll see where it takes us. Thanks!

gregs5 avatar Aug 01 '18 21:08 gregs5