dissector icon indicating copy to clipboard operation
dissector copied to clipboard

pcap dump parsing issue - tokenizing data error

Open craig opened this issue 3 years ago • 3 comments

This is the same dump from #51 - unfortunately, it has more issues:

$ file BT-20220314.pcap 
BT-20220314.pcap: pcap capture file, microsecond ts (little-endian) - version 2.4 (Ethernet, capture length 65536)
[INFO] 
    ____  _                     __            
   / __ \(_)____________  _____/ /_____  _____
  / / / / / ___/ ___/ _ \/ ___/ __/ __ \/ ___/
 / /_/ / (__  |__  )  __/ /__/ /_/ /_/ / /    
/_____/_/____/____/\___/\___/\__/\____/_/     

[INFO] Loading "BT-20220314.pcap"...
[INFO] Error reading PCAP file: Error tokenizing data. C error: Expected 24 fields in line 145732, saw 25

[INFO] Skipping the offending lines...
Traceback (most recent call last):
  File "/home/sb/VCS/ddos_dissector/src/reader.py", line 125, in read_pcap
    data: pd.DataFrame = pd.read_csv(output_buffer, parse_dates=['frame.time'], low_memory=False, delimiter=',')
  File "/home/sb/VCS/ddos_dissector/python-venv/lib/python3.9/site-packages/pandas/util/_decorators.py", line 311, in wrapper
    return func(*args, **kwargs)
  File "/home/sb/VCS/ddos_dissector/python-venv/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 680, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "/home/sb/VCS/ddos_dissector/python-venv/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 581, in _read
    return parser.read(nrows)
  File "/home/sb/VCS/ddos_dissector/python-venv/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 1250, in read
    index, columns, col_dict = self._engine.read(nrows)
  File "/home/sb/VCS/ddos_dissector/python-venv/lib/python3.9/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 230, in read
    data = self._reader.read(nrows)
  File "pandas/_libs/parsers.pyx", line 787, in pandas._libs.parsers.TextReader.read
  File "pandas/_libs/parsers.pyx", line 876, in pandas._libs.parsers.TextReader._read_rows
  File "pandas/_libs/parsers.pyx", line 1960, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 24 fields in line 145732, saw 25


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/sb/VCS/ddos_dissector/src/main.py", line 38, in <module>
    data: pd.DataFrame = pd.concat([read_file(f, filetype) for f in args.files])  # Read the FLOW file(s) into a dataframe
  File "/home/sb/VCS/ddos_dissector/src/main.py", line 38, in <listcomp>
    data: pd.DataFrame = pd.concat([read_file(f, filetype) for f in args.files])  # Read the FLOW file(s) into a dataframe
  File "/home/sb/VCS/ddos_dissector/src/reader.py", line 184, in read_file
    return read_pcap(filename)
  File "/home/sb/VCS/ddos_dissector/src/reader.py", line 129, in read_pcap
    data: pd.DataFrame = pd.read_csv(output_buffer, parse_dates=['frame.time'], low_memory=False, delimiter=',',
  File "/home/sb/VCS/ddos_dissector/python-venv/lib/python3.9/site-packages/pandas/util/_decorators.py", line 311, in wrapper
    return func(*args, **kwargs)
  File "/home/sb/VCS/ddos_dissector/python-venv/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 680, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "/home/sb/VCS/ddos_dissector/python-venv/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 575, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/home/sb/VCS/ddos_dissector/python-venv/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 933, in __init__
    self._engine = self._make_engine(f, self.engine)
  File "/home/sb/VCS/ddos_dissector/python-venv/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 1231, in _make_engine
    return mapping[engine](f, **self.options)
  File "/home/sb/VCS/ddos_dissector/python-venv/lib/python3.9/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 152, in __init__
    self._validate_parse_dates_presence(self.names)  # type: ignore[has-type]
  File "/home/sb/VCS/ddos_dissector/python-venv/lib/python3.9/site-packages/pandas/io/parsers/base_parser.py", line 228, in _validate_parse_dates_presence
    raise ValueError(
ValueError: Missing column provided to 'parse_dates': 'frame.time'

craig avatar Mar 22 '22 15:03 craig

Interesting... What tool do you use to capture the traffic / generate the PCAP? it seems it does not capture the timestamps

tvdhout avatar Mar 23 '22 13:03 tvdhout

Logs are created with https://github.com/google/stenographer#querying / stenoread like this:

docker exec -it so-steno stenoread "after 2022-03-07T11:50:00Z and before 2022-03-07T12:00:00Z" -w /tmp/07032022-11_50-12_00.pcap

craig avatar Mar 23 '22 14:03 craig

Thanks, I'll check it out and see if I can find how to fix the dissector for this format.

In the meantime you can use tcpdump with a file limit of 1 and a file rotation of x seconds. To capture 10 minutes of traffic: sudo tcpdump -W 1 -G 600 -w /tmp/capture10mins.pcap

tvdhout avatar Mar 23 '22 15:03 tvdhout

Not planned for now

tvdhout avatar Apr 21 '23 13:04 tvdhout