Take disk gaps in account when scraping
closes #1115
~~Fixed as proposed in the issue. This also requires changes to MappingStream.~~
~~Not ideal to have a special case for MappingStream in the scraper.~~
- Fixed by creating a separate mapping stream per contiguous region.
- Rewrote region mapper to handle nesting (LVM-LUKS-LVM, LUKS-LVM-LUKS)
disknow always returns the backing physical disk
CodSpeed Performance Report
Merging #1329 will not alter performance
Comparing skip-runs (c5e65ce) with main (580341b)
Summary
✅ 9 untouched
If we implement SEEK_HOLE in AlignedStream we can have a common implementation here and a specific implementation in MappingStream.
If we implement
SEEK_HOLEinAlignedStreamwe can have a common implementation here and a specific implementation inMappingStream.
Yeah perhaps. I don't know if there are other usecases yet?
I am wondering why we do not map "find the needle" over (adjacent) disk regions of interest, instead of creating a single stream for each disk?
I am wondering why we do not map "find the needle" over (adjacent) disk regions of interest, instead of creating a single stream for each disk?
Sounds like more difficult logic to handle needles split over (contiguous) regions. But if you have a good idea?
I am wondering why we do not map "find the needle" over (adjacent) disk regions of interest, instead of creating a single stream for each disk?
Sounds like more difficult logic to handle needles split over (contiguous) regions. But if you have a good idea?
Change https://github.com/fox-it/dissect.target/blob/78c549bec94e73851c2535ebdaac3eca6e629ca5/dissect/target/plugins/scrape/scrape.py#L97 to
sorted_regions = sorted(regions.items())
contiguous_streams = []
current_stream = None
for (offset, size), (source, source_offset) in sorted_regions:
# Check for a break in contiguity or the very first item
if not current_stream or offset != current_stream_end:
# If a stream is already being built, add it to the list
if current_stream:
contiguous_streams.append((current_stream_start, current_stream))
# Start a new stream
current_stream = MappingStream()
current_stream_start = offset
# Add the current region to the stream and update the end pointer
current_stream.add(offset, size, source, source_offset)
current_stream_end = offset + size
# Add the last remaining stream after the loop
if current_stream:
contiguous_streams.append((current_stream_start, current_stream))
yield disk, contiguous_streams
Not sure if it is necessary to set the MappingStream size?
And then adjust https://github.com/fox-it/dissect.target/blob/78c549bec94e73851c2535ebdaac3eca6e629ca5/dissect/target/plugins/scrape/scrape.py#L129 accordingly.
Although provisioning for it, why do we actually need to check needles split over contiguous regions, since we already deal with LogicalVolumeSystems explicitly?
Codecov Report
:x: Patch coverage is 84.61538% with 18 lines in your changes missing coverage. Please review.
:white_check_mark: Project coverage is 80.74%. Comparing base (580341b) to head (c5e65ce).
| Files with missing lines | Patch % | Lines |
|---|---|---|
| dissect/target/plugins/scrape/scrape.py | 86.45% | 13 Missing :warning: |
| dissect/target/volume.py | 68.75% | 5 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## main #1329 +/- ##
=======================================
Coverage 80.73% 80.74%
=======================================
Files 390 390
Lines 34217 34303 +86
=======================================
+ Hits 27626 27698 +72
- Misses 6591 6605 +14
| Flag | Coverage Δ | |
|---|---|---|
| unittests | 80.74% <84.61%> (+<0.01%) |
:arrow_up: |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
Does this also fix qfind for raid volumes (ddf, md)?
Does this also fix qfind for raid volumes (ddf, md)?
In theory, yes.
I will handcraft a qcow2 to verify and add a unit test.