dicom-rs icon indicating copy to clipboard operation
dicom-rs copied to clipboard

Potential mismatch between self.position and reader position in StatefulDecoder

Open abustany opened this issue 2 months ago • 2 comments

I have a weird (broken?) DICOM file, with a multi valued UL tag of length 10 (and not 8). Both dcmtk and dcmjs manage to parse the file, but dicom-rs trips on it because it assumes that the header length is equal to (value multiplicity) x (value size). I simplified the file and attached it to this ticket.

weird-group-length.dcm.zip

I'm not sure what would be the best way to track a mismatch between the size actually being read (in this case, 2xUL = 8 bytes) and the size that should be read (in this case, 10 bytes). Because dicom-rs fails to skip the extra two garbage bytes after the actual ULs, parsing fails right after this tag.

dicom-rs revision: 42ab9e0fd129cccf8a49eee674dbbbc85cb6b3fa

abustany avatar Oct 03 '25 14:10 abustany

Yup, it's broken. Whichever DICOM tool changed a generic group length attribute to contain "Anonymized" should have removed the attribute instead. :)

With that said, it would be nice for DICOM-rs to be resilient to mismatches between length and VR. Assuming first that the length defines an accurate boundary despite the incoherence might be a better approach than reading too much or too little into it.

Enet4 avatar Oct 03 '25 14:10 Enet4

One fix that wouldn't be too intrusive, assuming we want to go in the direction of trusting the length field more, would be to insert a "counting reader", ie. a Read implementation that forwards to StatefulDecoder.from, but counts the bytes being read. We could then check after each read_xxx call if the number of bytes actually read lags behind self.position.

abustany avatar Oct 03 '25 15:10 abustany