dicom-rs
dicom-rs copied to clipboard
Value in 2001,0010 private tag causes "Could not read data set token"
I'm iterating over a large set of DICOM files. For a significant subset of the files, dicom::object::open_file() fails with error "Could not read data set token".
I'm not sure I can provide an anonymized version of the DICOM file, but I've run a debug build of dicom-rs' dcmdump in an RUST_BACKTRACE=1 environment, and attached file could_not_read_data_set_token-backtrace.txt shows the detailed error and backtrace I got.
The DICOM file contains a CT image. When I run dcmtk's dcmdump on the file, it outputs four warnings:
W: Found element (2001,105f) with VR UN and undefined length, reading a sequence with transfer syntax LittleEndianImplicit (CP-246)
W: Found element (2005,1084) with VR UN and undefined length, reading a sequence with transfer syntax LittleEndianImplicit (CP-246)
W: Found element (2005,1402) with VR UN and undefined length, reading a sequence with transfer syntax LittleEndianImplicit (CP-246)
W: Found element (2005,140f) with VR UN and undefined length, reading a sequence with transfer syntax LittleEndianImplicit (CP-246)
Attached file dcmtk_dcmdump-out.txt contains output from dcmtk's dcmdump where I have redacted a couple of fields. dicom_tags_from_dcmdump.txt could_not_read_data_set_token-backtrace.txt
Thank you for reporting. I managed to track down the root of the problem: according to section 6.2.2 of chapter 5, data sets inside sequences with the value representation UN are expected to be encoded in implicit VR little endian, regardless of the current transfer syntax. This is an exception to the rule defined in section 7.5, one which is not being considered at the moment.
The way to overcome this that seems intuitive to me right now would be to incorporate additional logic to handle this case in the dicom-parser
crate. From the moment it enters a sequence of VR UN, it would remember to always use implicit VR little endian until it reaches the end of that sequence. This could be done either in the lower layer (stateful decoder and encoder) or in the layer above that one (data set reader/writer).
If any additional guidance needed towards resolving this, just let me know.
Meanwhile, I managed to hex-edit the study into an anonymized form which still makes dirom-rs choke. The anonymized study will exist at https://troels.arvin.dk/rust/dicom/anon-could_not_read_data_set_token.dcm for a while.
Is there any way to force dicom-rs to parse in a "best-effort" way, so one can at least extract SIUID and SOP instance UID from the DICOM file?
Is there any way to force dicom-rs to parse in a "best-effort" way, so one can at least extract SIUID and SOP instance UID from the DICOM file?
There currently isn't, but the idea of introducing options to the object loading process has come up in my head before, and I agree that it would be useful.