gomft icon indicating copy to clipboard operation
gomft copied to clipboard

Sparse data runs handling bug

Open joonas-fi opened this issue 3 years ago • 0 comments

When offsetLength is zero, it has special semantic meaning (= sparse file): https://github.com/t9t/gomft/blob/f64df3912ca98b2abeaa2f9e79f7c8a59b159ba9/mft/mft.go#L365

(source, see "Example 4 - Sparse, Unfragmented File")

This code decodes a sparse offset as 0, which would incorrectly read a fragment beginning from first cluster of volume (= VBR).

My initial naive idea was to mark sparse runs as having offset -1, but that's incorrect as each run's offset is relative to previous run, i.e. negative values are legal. I solved this issue in my code by making the offset a pointer to an integer, i.e. nil means it's sparse.

Directly related to this, DataRunsToFragments needs to take sparse runs into account. Data run offsets are relative previous non-sparse run, not previous run: https://github.com/t9t/gomft/blob/f64df3912ca98b2abeaa2f9e79f7c8a59b159ba9/mft/mft.go#L390 (link to "Example 4 - Sparse, Unfragmented File" explains this)

This assignment can only be done if current run is non-sparse. I suggest renaming the variable to e.g. previousNonSparseOffsetCluster.

Natural consequence of sparse run support is supporting sparse fragments, which I again marked as *int to discriminate between sparse and non-sparse runs. I had to re-write fragment.Reader in my project because alternating between actual disk-backed fragments vs sparse fragments would've been hackish to add without a re-write.

joonas-fi avatar May 06 '21 09:05 joonas-fi