tcpdump
tcpdump copied to clipboard
Packet sanitization and IP masking
Created a separate branch so I can keep upstream and workgroup README and version strings separate.
As a general note, this problem is not new and similar tools already exist:
- http://scrub-tcpdump.sourceforge.net/docs.php
- https://www.wireshark.org/lists/wireshark-users/201201/msg00106.html
- https://blog.packet-foo.com/2013/07/trace-file-sanitization-for-network-analysts/
- http://www.tm.uka.de/software/pktanon/links/index.html
- https://github.com/thepacketgeek/sanicap
If you still find it better to suggest a new solution, the proposed changes need to be one clean commit, which explains why this specific solution is better.
Hi there!
I'm currently working on the +#include <arpa/inet.h> issue, and a better description for the PR request, but I'd like to take a moment to respond to the question of "Why does this PR exist?"
As a former network analyst, I'm certainly aware that tools exist that sort of do these anonymization/privatization and sanitization features, but there are (in my experience) some not-insignificant issues with their adoption. Our motivation for ultimately adding this to tcpdump are the following:
-
Simplicity and automation. We are trying to automate the collection and analysis of network traffic coming off of SDN-enabled switches, and we wanted something that did bulk sanitization without requiring multistep or manual processes (i.e., first collecting, then moving, then scrubbing, etc.) because...
-
Ability to scale. Most of our work entail large packet captures (e.g., hundreds of GBs), which can pose scalability issues when running subsequent tools, especially if they require writing full packet captures to memory before sanitizing and anonymizing. We had looked at scapy, for example (which is a backbone to some of these tools), which does not seem to handle scaling challenges particularly well.
-
Thorough sanitization. In the discussions referenced, one will see comments about sanitizing higher layer protocols (e.g., DNS) but potentially omitting or mishandling another (e.g., HTTP). As well, a problem in one layer can undo the anonymity offered by scrubbing a lower layer protocol. This approach may not solve every privacy concern, but completely zeroing out (or truncating) all payload data above TCP/UDP seems to pretty much resolve many of the more conventional privacy issues.
-
Runs cleanly on Linux. We require something that can be run on Linux as that's where we do the bulk of our work. We are also finding that Linux is typically the base OS on "bare metal" switches / "disaggregated" network gear. Windows-only based approaches (like TraceWrangler) are challenging for those of us running tcpdump on primarily Linux-based environments or on physical network switches.
-
Is an active project. Ideally, we'd like something that is maintained/not orphaned, and is stable. For example, SCRUBtcpdump looks promising, but doesn't seem to have been updated in a decade, and segfaults from the command line with unexpected input. At the risk of stating the obvious, tcpdump's long history and widescale adoption, significant userbase, active community and development base makes it an ideal and attractive method.
Finally, there's the utility of the proposed -00 option which shrinks the size of data collection. This reduction has significant impact for folks doing downstream ingest and analytics of really large network collections. (Sending 1GB to a GPU farm is way better than 10GB!) Having this functionality built into tcpdump opens it up as a really great option not just for traditional networking folks, but for people doing research in machine learning (e.g., our team) in computer networks and security.
All that being said, thanks so much for your input and guidance! I'll work on better defining the description and the code issues pointed out.
lilchurro [email protected] wrote: > 2 Ability to scale. Most of our work entail large packet captures
> 3 Thorough sanitization. In the discussions referenced, one will see
So, if you are saying that you would typically want to run the sanitization as you capture, then I get why you want it in tcpdump. That way there is never a file that is non-sanitized.
Guy Harris [email protected] wrote: > which features; as I remember, Visual Studio 2013 (the compiler > currently used on the Wireshark buildbots) support all of those.
... If this compiler is redily available, then I think that covers windows.
My only concern is whether or not someone is still trying to build for some old HPUX or something like that. I'm happy to abandon them :-)
-- ] Never tell me the odds! | ipv6 mesh networks [ ] Michael Richardson, Sandelman Software Works | network architect [ ] [email protected] http://www.sandelman.ca/ | ruby on rails [
Hello again... I'm having some issues compiling on Windows. (Using Visual Studio 2017, and attempting to compile with headers and libraries as provided by WpcapSrc_4_1_3.zip.)
I'm trying to compile just the master branch of tcpdump (i.e., without my changes), but am stuck with some dependency issues (among other things). It appears to me that windump version 4.10 doesn't compile on Windows as it refers to the file util.c, which doesn't exist. Also, I'm having issues using wpcapsrc 4.1.3, as the pcap.h included therein contains reference to sys/time.h -- so is there a specific version of winpcap that I should be using? The win32 readme seems vague about this.
Is there some way I can get assistance in compiling the master branch on Windows? It seems like I should get that working before trying to verify that my changes can compile.
FYI, I've added new tests to verify the anonymization and packet sanitization for TCP/UDP packet payloads, and have integrated them into TESTrun.sh. I noticed that there wasn't really a way to test for cases where -w [savefile] differs from the usual tcpdump output, so the new test file TESToutfile checks for such a case. For ease of use, it's basically coded in the same structure as TESTonce and uses TESTOUTLIST to keep track of tests related to savefiles.
If there's something else I can do to move this PR along, please let me know; otherwise, I will consider this about as done as it could be. 🙂
Ping... is there anything else I can do to move this PR along?
😱 Well, that rebase was a harrowing experience. 1 clean commit now tho.
Is this something @lilchurro should consider spending the time to bring up to date? Would be great to get some guidance here!
There were two discussions made in this pull request. One about the code itself, which indeed belongs here. I cannot help completing it before I finish other bits of work, other developers may be available. Another discussion, which would belong more to the tcpdump-workers mailing list, was whether it is right to incorporate the masking function into tcpdump as opposed to doing it in a separate binary. I guess the latter one could go into more detail to reach a more pronounced consensus. I, for instance, am not quite convinced yet, but I could be wrong.
Thanks - really appreciate the reply. Understood on your concerns. I'll share that we looked at other methods (and all the tools you referenced last year) and still came to the conclusion that for anyone who wants this functionality, this is the right path. That said though, I'll start a thread in the tcpdump-workers and we'll see where it takes us. Thanks!