dissect.target icon indicating copy to clipboard operation
dissect.target copied to clipboard

Scraper bugs in regionmap

Open JSCU-CNI opened this issue 8 months ago • 4 comments

Following the discussion at https://github.com/fox-it/dissect.target/pull/1103/files#r2043869029:

It seems fh.read() can throw EOFError exceptions. Unsure if this should be filed in dissect.utils or dissect.hypervisor.

target-qfind /tmp/example-ubuntu-server-24.04.1.vmx --needles "Ubuntu\s24\.04\.\d\sLTS" --regex --unique
<Target /tmp/example-ubuntu-server-24.04.1.vmx>
[Current disk: <VmdkContainer size=21474836480 vs=<DissectVolumeSystem serial=None>>]
8.87%Traceback (most recent call last):
  File "/tmp/dissect.target/venv/bin/target-qfind", line 10, in <module>
    sys.exit(main())
             ~~~~^^
  File "/tmp/dissect.target/dissect/target/tools/utils.py", line 237, in wrapper
    return func(\*args, \*\*kwargs)
  File "/tmp/dissect.target/dissect/target/tools/qfind.py", line 44, in main
    for _ in target.qfind(
             ~~~~~~~~~~~~^
        args.needles,
        ^^^^^^^^^^^^^
    ...<8 lines>...
        args.window,
        ^^^^^^^^^^^^
    ):
    ^
  File "/tmp/dissect.target/dissect/target/plugins/scrape/qfind.py", line 149, in qfind
    for _, stream, needle, offset in self.target.scrape.find(
                                     ~~~~~~~~~~~~~~~~~~~~~~~^
        list(needle_lookup.keys()), progress=progress(self.target) if not record else None
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ):
    ^
  File "/tmp/dissect.target/dissect/target/plugins/scrape/scrape.py", line 128, in find
    for needle, offset in find_needles(
                          ~~~~~~~~~~~~^
        stream,
        ^^^^^^^
    ...<3 lines>...
        progress=(lambda current: progress(disk, current, stream.size)) if progress else None,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ):
    ^
  File "/tmp/dissect.target/dissect/target/helpers/scrape.py", line 66, in find_needles
    next_block = fh.read(read_size)
  File "/tmp/dissect.target/venv/lib/python3.13/site-packages/dissect/util/stream.py", line 135, in read
    r.append(self._read(self._pos, read_len))
             ~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/dissect.target/venv/lib/python3.13/site-packages/dissect/util/stream.py", line 313, in _read
    run_idx = self._get_run_idx(offset)
  File "/tmp/dissect.target/venv/lib/python3.13/site-packages/dissect/util/stream.py", line 308, in _get_run_idx
    raise EOFError(f"No mapping for offset {offset}")
EOFError: No mapping for offset 1904214016

This reference VM was created on VMware Workstation 17.5.2 using ubuntu-24.04.1-live-server-amd64.iso, following a default installation path with Virtual disk contents are stored in a single file and Disk space is not preallocated for this virtual disk.

JSCU-CNI avatar Apr 22 '25 13:04 JSCU-CNI

After some ~soul searching~ thinking this is actually a bug in the scrape plugin: https://github.com/fox-it/dissect.target/blob/95cade3d1f4934d3cd58f499cdf6219240e1db36/dissect/target/plugins/scrape/scrape.py#L103-L106

It can happen when we remove a volume for scraping (i.e. when it's part of a logical volume and we want to skip scraping the "raw" volume. I have not installed Ubuntu but I assume it defaults to an LVM partition layout these days?

Ideally the scraping code is made aware of the gap and just skips unmapped sections. We don't want to waste time scraping a section full of \x00.

Schamper avatar May 20 '25 09:05 Schamper

I have not installed Ubuntu but I assume it defaults to an LVM partition layout these days?

Correct.

Ideally the scraping code is made aware of the gap and just skips unmapped sections. We don't want to waste time scraping a section full of \x00.

How would you propose we implement that in qfind?

JSCU-CNI avatar Aug 28 '25 11:08 JSCU-CNI

How would you propose we implement that in qfind?

I guess find_needles could have a special case for when isinstance(fh, MappingStream) and skip gaps? https://github.com/fox-it/dissect.target/blob/9058c704fa8b1b0cedb0633109bf7de5b15c9573/dissect/target/helpers/scrape.py#L18

Schamper avatar Sep 02 '25 11:09 Schamper

Bad bot @DissectBot, could the title be reverted?

JSCU-CNI avatar Oct 15 '25 09:10 JSCU-CNI