pcap_generator
pcap_generator copied to clipboard
IPv6 packets being generated by default and negative timestamps
Hello, there are two bugs, I believe:
- Ipv6 packets are generated even if I specify IPv4 source and destination
a) The head of my input file specifies IPv4 sources and destinations:
timestamp=0.13052282,protocol=tcp_syn,src_ip=162.159.135.234,src_port=443,dst_ip=10.0.2.15,dst_port=53707
timestamp=0.15329465,protocol=tcp_syn,src_ip=10.0.2.15,src_port=53707,dst_ip=162.159.135.234,dst_port=443
timestamp=0.21709388,protocol=tcp_syn,src_ip=162.159.135.234,src_port=443,dst_ip=10.0.2.15,dst_port=53707
timestamp=0.25721452,protocol=tcp_syn,src_ip=10.0.2.15,src_port=53707,dst_ip=162.159.135.234,dst_port=443
timestamp=0.37957603,protocol=tcp_syn,src_ip=162.159.135.234,src_port=443,dst_ip=10.0.2.15,dst_port=53707
b) Then, I produce the output:
python3 pcap_generator_from_csv.py -i discord_1_no_tls_csv-to-pcap_generated.csv -o asd -R 0
c) But, for some reason, you provide a default src_ipv6 and dst_ipv6 if I don't provide any:
d) The default IPv6 default pair is replacing the provided IPv4:
- Some timestamps are negative. What am I missing here?
Thank you in advance.
So, it looks like if I force ether_type=ipv4 argument I can workaround it.
Considering the same input file I mentioned before, i.e.:
timestamp=0.13052282,protocol=tcp_syn,src_ip=162.159.135.234,src_port=443,dst_ip=10.0.2.15,dst_port=53707,ether_type=ipv4
timestamp=0.15329465,protocol=tcp_syn,src_ip=10.0.2.15,src_port=53707,dst_ip=162.159.135.234,dst_port=443
timestamp=0.21709388,protocol=tcp_syn,src_ip=162.159.135.234,src_port=443,dst_ip=10.0.2.15,dst_port=53707
timestamp=0.25721452,protocol=tcp_syn,src_ip=10.0.2.15,src_port=53707,dst_ip=162.159.135.234,dst_port=443
timestamp=0.37957603,protocol=tcp_syn,src_ip=162.159.135.234,src_port=443,dst_ip=10.0.2.15,dst_port=53707
I added the ether_type=ipv4 argument only to the first line and the output PCAP now considering the IPv4s I provided.
I believe that you decided that IPv6 has priority when generating packets by default. However, the documentation on the protocol argument indicates otherwise.
Hi, thanks again for playing around with the code.
Let me respond to the ether_type issue first:
Yes, for reason if ether_type is not enforced, it uses default IPv6 addresses. It was based on a missing conditional check and the order of setting default values, wherein ipv6 was coming after ipv4. So, since ether_type was not checked properly, the default ipv6 address became used if ether_type was not explicitly stated in the [input.csv] file.
Now I fixed the bug, and the default is ipv4. Note, however, as it is indicated in the README.md, some things are not stupid/bullet-proof, so especially the ether_type is something that is advisable to enforce and not mix ipv4 ether_type with ipv6 ip addresses in the .csv file haha.
regarding the timestamp issue:
I managed to solve it. Now, it works, please try again. Bear in mind that tcpdump will show you the timestamp you set in the csv file as-is, however, Wireshark shows relative timestamps by default, so it takes your first packet's timestamp as 0 and calculates all the rest relative to it. To avoid that, set your first packet's timestamp to 0.0000 (I guess).
The problem was that the pcap header's magic header data was not set properly to indicate whether the timestamps were sec+microsec or sec+nanosec. After that, I rewrote the timestamp generation part and converted the HEX representations properly to big_endian, and now it works.
Thank you again for taking the trouble and playing around with this stuff. Otherwise, this would have never solved :)
Hi, thanks again for playing around with the code.
Let me respond to the
ether_typeissue first: Yes, for reason ifether_typeis not enforced, it uses default IPv6 addresses. It was based on a missing conditional check and the order of setting default values, wherein ipv6 was coming after ipv4. So, since ether_type was not checked properly, the default ipv6 address became used ifether_typewas not explicitly stated in the[input.csv]file.Now I fixed the bug, and the default is
ipv4. Note, however, as it is indicated in theREADME.md, some things are not stupid/bullet-proof, so especially the ether_type is something that is advisable to enforce and not mixipv4 ether_typewithipv6 ip addressesin the.csvfile haha.
I did not know about this tiny difference between tcpdumpand wireshark. Thank you a lot!
It is now working like a charm. I don't think the relative timestamp issue is a big deal (at least for me)
I would have to implement your idea from scratch for my research. I'm using input generated by a time series neural network and need to convert CSV to PCAP.
I hope you don't mind continuing to discover small issues
Also, from what I understand, your script only supports TCP_SYN. This way I can't create a TCP flow, correct?
Nopes, I don't mind fixing issues, especially if it makes the tool more useful.
Yes, the script only supports UDP and TCP SYN. You cannot make TCP flow as of today.
The reason is simple; as you can observe from the code, packets are basically generated by creating the HEX representation of the bytes. This makes the generation of millions of packets significantly faster compared to using more sophisticated higher-level libs, like Scapy. You could use Scapy to do the same or even more easily, but it will be way too slow.
And creating flows with this approach is quite difficult; it not only requires substantial work but even the input file (input.csv) would become very complex, defeating the whole purpose of the app.
But let me know if you intend to implement that feature. I am open for pull requests :)
Thank you! Now, I understand the reason you did not implement it. I've only dealt with Scapy so far, and yes, it is slow. It is slow even for packet capture. I had to force it to open low-level sockets so I could capture more packets.
Back to the topic: I am interested in implementing this part for my research and using your tool. I have an entry with TCP flows that go beyond TCP SYN (e.g., TCP ACK), but I need to learn its code better first. Another alternative would be to use nPrint, which also promises to convert CSV->PCAP. However, it is not working as promised.
nPrint...lol :) I also got into that research for quite some time...still cannot understand why some people think it is a gamechanger. [cut] I have written here my experience with nPrint, but then I removed it [/cut] But yeah, their tools was not working for my projects either. I needed to write my own conversion scripts to even make nprint to work.
With TCP flows you need to keep track of a lot of metadata, like sequence numbers, certain flags, etc. ,that's why I only implemented SYN (for now). Let me know if i can help you somehow...i tried making my code well-commented, but it might be well-commented for me only :P
I don't think it is a gamechanger, but it provides more fine-grained control over each packet feature.
There is a trade-off:
- If the neural network (or classified) model knows fewer resources, like yours, where information like protocol=udp is passed, I have no control over the UDP protocol fields;
- However, if I set each field to a single bit and pass that to the AI model, it can (in theory) learn the distribution of that data and generate output that looks more like a real traffic distribution (assuming it is a time series, for example).
The problem with nPrint is the number of columns needed, while your PCAP generator is lightweight and more straightforward.
In summary, if the model has more features, it consumes more resources (e.g., CPU, GPU, memory), while the post-processing (e.g., a PCAP generator) is more lightweight and vice-versa.
But I believe it would be much easier to extend your implementation, since its Python and looks well-commented. I'm going to spend some time next weeks on it. I hope we can make it work with stateful TCP connections. :)
To do this, I need to read more about the details of TCP and different cases (e.g., retransmission, sequence number) and their flags