dissect.target icon indicating copy to clipboard operation
dissect.target copied to clipboard

Take disk gaps in account when scraping

Open twiggler opened this issue 3 months ago • 6 comments

closes #1115

~~Fixed as proposed in the issue. This also requires changes to MappingStream.~~

~~Not ideal to have a special case for MappingStream in the scraper.~~

  • Fixed by creating a separate mapping stream per contiguous region.
  • Rewrote region mapper to handle nesting (LVM-LUKS-LVM, LUKS-LVM-LUKS)
  • disk now always returns the backing physical disk

twiggler avatar Sep 17 '25 07:09 twiggler

CodSpeed Performance Report

Merging #1329 will not alter performance

Comparing skip-runs (c5e65ce) with main (580341b)

Summary

✅ 9 untouched

codspeed-hq[bot] avatar Sep 17 '25 07:09 codspeed-hq[bot]

If we implement SEEK_HOLE in AlignedStream we can have a common implementation here and a specific implementation in MappingStream.

Schamper avatar Sep 17 '25 08:09 Schamper

If we implement SEEK_HOLE in AlignedStream we can have a common implementation here and a specific implementation in MappingStream.

Yeah perhaps. I don't know if there are other usecases yet?

I am wondering why we do not map "find the needle" over (adjacent) disk regions of interest, instead of creating a single stream for each disk?

twiggler avatar Sep 17 '25 11:09 twiggler

I am wondering why we do not map "find the needle" over (adjacent) disk regions of interest, instead of creating a single stream for each disk?

Sounds like more difficult logic to handle needles split over (contiguous) regions. But if you have a good idea?

Schamper avatar Sep 17 '25 11:09 Schamper

I am wondering why we do not map "find the needle" over (adjacent) disk regions of interest, instead of creating a single stream for each disk?

Sounds like more difficult logic to handle needles split over (contiguous) regions. But if you have a good idea?

Change https://github.com/fox-it/dissect.target/blob/78c549bec94e73851c2535ebdaac3eca6e629ca5/dissect/target/plugins/scrape/scrape.py#L97 to

sorted_regions = sorted(regions.items())
contiguous_streams = []
current_stream = None

for (offset, size), (source, source_offset) in sorted_regions:
    # Check for a break in contiguity or the very first item
    if not current_stream or offset != current_stream_end:
        # If a stream is already being built, add it to the list
        if current_stream:
            contiguous_streams.append((current_stream_start, current_stream))
        
        # Start a new stream
        current_stream = MappingStream()
        current_stream_start = offset

    # Add the current region to the stream and update the end pointer
    current_stream.add(offset, size, source, source_offset)
    current_stream_end = offset + size

# Add the last remaining stream after the loop
if current_stream:
    contiguous_streams.append((current_stream_start, current_stream))

yield disk, contiguous_streams

Not sure if it is necessary to set the MappingStream size?

And then adjust https://github.com/fox-it/dissect.target/blob/78c549bec94e73851c2535ebdaac3eca6e629ca5/dissect/target/plugins/scrape/scrape.py#L129 accordingly.

Although provisioning for it, why do we actually need to check needles split over contiguous regions, since we already deal with LogicalVolumeSystems explicitly?

twiggler avatar Sep 17 '25 13:09 twiggler

Codecov Report

:x: Patch coverage is 84.61538% with 18 lines in your changes missing coverage. Please review. :white_check_mark: Project coverage is 80.74%. Comparing base (580341b) to head (c5e65ce).

Files with missing lines Patch % Lines
dissect/target/plugins/scrape/scrape.py 86.45% 13 Missing :warning:
dissect/target/volume.py 68.75% 5 Missing :warning:
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1329   +/-   ##
=======================================
  Coverage   80.73%   80.74%           
=======================================
  Files         390      390           
  Lines       34217    34303   +86     
=======================================
+ Hits        27626    27698   +72     
- Misses       6591     6605   +14     
Flag Coverage Δ
unittests 80.74% <84.61%> (+<0.01%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov[bot] avatar Sep 25 '25 11:09 codecov[bot]

Does this also fix qfind for raid volumes (ddf, md)?

JSCU-CNI avatar Dec 04 '25 14:12 JSCU-CNI

Does this also fix qfind for raid volumes (ddf, md)?

In theory, yes.

I will handcraft a qcow2 to verify and add a unit test.

twiggler avatar Dec 04 '25 15:12 twiggler