pcap dump parsing issue - tokenizing data error
This is the same dump from #51 - unfortunately, it has more issues:
$ file BT-20220314.pcap
BT-20220314.pcap: pcap capture file, microsecond ts (little-endian) - version 2.4 (Ethernet, capture length 65536)
[INFO]
____ _ __
/ __ \(_)____________ _____/ /_____ _____
/ / / / / ___/ ___/ _ \/ ___/ __/ __ \/ ___/
/ /_/ / (__ |__ ) __/ /__/ /_/ /_/ / /
/_____/_/____/____/\___/\___/\__/\____/_/
[INFO] Loading "BT-20220314.pcap"...
[INFO] Error reading PCAP file: Error tokenizing data. C error: Expected 24 fields in line 145732, saw 25
[INFO] Skipping the offending lines...
Traceback (most recent call last):
File "/home/sb/VCS/ddos_dissector/src/reader.py", line 125, in read_pcap
data: pd.DataFrame = pd.read_csv(output_buffer, parse_dates=['frame.time'], low_memory=False, delimiter=',')
File "/home/sb/VCS/ddos_dissector/python-venv/lib/python3.9/site-packages/pandas/util/_decorators.py", line 311, in wrapper
return func(*args, **kwargs)
File "/home/sb/VCS/ddos_dissector/python-venv/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 680, in read_csv
return _read(filepath_or_buffer, kwds)
File "/home/sb/VCS/ddos_dissector/python-venv/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 581, in _read
return parser.read(nrows)
File "/home/sb/VCS/ddos_dissector/python-venv/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 1250, in read
index, columns, col_dict = self._engine.read(nrows)
File "/home/sb/VCS/ddos_dissector/python-venv/lib/python3.9/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 230, in read
data = self._reader.read(nrows)
File "pandas/_libs/parsers.pyx", line 787, in pandas._libs.parsers.TextReader.read
File "pandas/_libs/parsers.pyx", line 876, in pandas._libs.parsers.TextReader._read_rows
File "pandas/_libs/parsers.pyx", line 1960, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 24 fields in line 145732, saw 25
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/sb/VCS/ddos_dissector/src/main.py", line 38, in <module>
data: pd.DataFrame = pd.concat([read_file(f, filetype) for f in args.files]) # Read the FLOW file(s) into a dataframe
File "/home/sb/VCS/ddos_dissector/src/main.py", line 38, in <listcomp>
data: pd.DataFrame = pd.concat([read_file(f, filetype) for f in args.files]) # Read the FLOW file(s) into a dataframe
File "/home/sb/VCS/ddos_dissector/src/reader.py", line 184, in read_file
return read_pcap(filename)
File "/home/sb/VCS/ddos_dissector/src/reader.py", line 129, in read_pcap
data: pd.DataFrame = pd.read_csv(output_buffer, parse_dates=['frame.time'], low_memory=False, delimiter=',',
File "/home/sb/VCS/ddos_dissector/python-venv/lib/python3.9/site-packages/pandas/util/_decorators.py", line 311, in wrapper
return func(*args, **kwargs)
File "/home/sb/VCS/ddos_dissector/python-venv/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 680, in read_csv
return _read(filepath_or_buffer, kwds)
File "/home/sb/VCS/ddos_dissector/python-venv/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 575, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/home/sb/VCS/ddos_dissector/python-venv/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 933, in __init__
self._engine = self._make_engine(f, self.engine)
File "/home/sb/VCS/ddos_dissector/python-venv/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 1231, in _make_engine
return mapping[engine](f, **self.options)
File "/home/sb/VCS/ddos_dissector/python-venv/lib/python3.9/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 152, in __init__
self._validate_parse_dates_presence(self.names) # type: ignore[has-type]
File "/home/sb/VCS/ddos_dissector/python-venv/lib/python3.9/site-packages/pandas/io/parsers/base_parser.py", line 228, in _validate_parse_dates_presence
raise ValueError(
ValueError: Missing column provided to 'parse_dates': 'frame.time'
Interesting... What tool do you use to capture the traffic / generate the PCAP? it seems it does not capture the timestamps
Logs are created with https://github.com/google/stenographer#querying / stenoread like this:
docker exec -it so-steno stenoread "after 2022-03-07T11:50:00Z and before 2022-03-07T12:00:00Z" -w /tmp/07032022-11_50-12_00.pcap
Thanks, I'll check it out and see if I can find how to fix the dissector for this format.
In the meantime you can use tcpdump with a file limit of 1 and a file rotation of x seconds. To capture 10 minutes of traffic: sudo tcpdump -W 1 -G 600 -w /tmp/capture10mins.pcap
Not planned for now