Scraper bugs in regionmap
Following the discussion at https://github.com/fox-it/dissect.target/pull/1103/files#r2043869029:
It seems fh.read() can throw EOFError exceptions. Unsure if this should be filed in dissect.utils or dissect.hypervisor.
target-qfind /tmp/example-ubuntu-server-24.04.1.vmx --needles "Ubuntu\s24\.04\.\d\sLTS" --regex --unique
<Target /tmp/example-ubuntu-server-24.04.1.vmx>
[Current disk: <VmdkContainer size=21474836480 vs=<DissectVolumeSystem serial=None>>]
8.87%Traceback (most recent call last):
File "/tmp/dissect.target/venv/bin/target-qfind", line 10, in <module>
sys.exit(main())
~~~~^^
File "/tmp/dissect.target/dissect/target/tools/utils.py", line 237, in wrapper
return func(\*args, \*\*kwargs)
File "/tmp/dissect.target/dissect/target/tools/qfind.py", line 44, in main
for _ in target.qfind(
~~~~~~~~~~~~^
args.needles,
^^^^^^^^^^^^^
...<8 lines>...
args.window,
^^^^^^^^^^^^
):
^
File "/tmp/dissect.target/dissect/target/plugins/scrape/qfind.py", line 149, in qfind
for _, stream, needle, offset in self.target.scrape.find(
~~~~~~~~~~~~~~~~~~~~~~~^
list(needle_lookup.keys()), progress=progress(self.target) if not record else None
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
):
^
File "/tmp/dissect.target/dissect/target/plugins/scrape/scrape.py", line 128, in find
for needle, offset in find_needles(
~~~~~~~~~~~~^
stream,
^^^^^^^
...<3 lines>...
progress=(lambda current: progress(disk, current, stream.size)) if progress else None,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
):
^
File "/tmp/dissect.target/dissect/target/helpers/scrape.py", line 66, in find_needles
next_block = fh.read(read_size)
File "/tmp/dissect.target/venv/lib/python3.13/site-packages/dissect/util/stream.py", line 135, in read
r.append(self._read(self._pos, read_len))
~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^
File "/tmp/dissect.target/venv/lib/python3.13/site-packages/dissect/util/stream.py", line 313, in _read
run_idx = self._get_run_idx(offset)
File "/tmp/dissect.target/venv/lib/python3.13/site-packages/dissect/util/stream.py", line 308, in _get_run_idx
raise EOFError(f"No mapping for offset {offset}")
EOFError: No mapping for offset 1904214016
This reference VM was created on VMware Workstation 17.5.2 using ubuntu-24.04.1-live-server-amd64.iso, following a default installation path with Virtual disk contents are stored in a single file and Disk space is not preallocated for this virtual disk.
After some ~soul searching~ thinking this is actually a bug in the scrape plugin: https://github.com/fox-it/dissect.target/blob/95cade3d1f4934d3cd58f499cdf6219240e1db36/dissect/target/plugins/scrape/scrape.py#L103-L106
It can happen when we remove a volume for scraping (i.e. when it's part of a logical volume and we want to skip scraping the "raw" volume. I have not installed Ubuntu but I assume it defaults to an LVM partition layout these days?
Ideally the scraping code is made aware of the gap and just skips unmapped sections. We don't want to waste time scraping a section full of \x00.
I have not installed Ubuntu but I assume it defaults to an LVM partition layout these days?
Correct.
Ideally the scraping code is made aware of the gap and just skips unmapped sections. We don't want to waste time scraping a section full of \x00.
How would you propose we implement that in qfind?
How would you propose we implement that in qfind?
I guess find_needles could have a special case for when isinstance(fh, MappingStream) and skip gaps?
https://github.com/fox-it/dissect.target/blob/9058c704fa8b1b0cedb0633109bf7de5b15c9573/dissect/target/helpers/scrape.py#L18
Bad bot @DissectBot, could the title be reverted?