dicom-rs icon indicating copy to clipboard operation
dicom-rs copied to clipboard

Value in 2001,0010 private tag causes "Could not read data set token"

Open troelsarvin opened this issue 2 years ago • 3 comments

I'm iterating over a large set of DICOM files. For a significant subset of the files, dicom::object::open_file() fails with error "Could not read data set token".

I'm not sure I can provide an anonymized version of the DICOM file, but I've run a debug build of dicom-rs' dcmdump in an RUST_BACKTRACE=1 environment, and attached file could_not_read_data_set_token-backtrace.txt shows the detailed error and backtrace I got.

The DICOM file contains a CT image. When I run dcmtk's dcmdump on the file, it outputs four warnings:

W: Found element (2001,105f) with VR UN and undefined length, reading a sequence with transfer syntax LittleEndianImplicit (CP-246)
W: Found element (2005,1084) with VR UN and undefined length, reading a sequence with transfer syntax LittleEndianImplicit (CP-246)
W: Found element (2005,1402) with VR UN and undefined length, reading a sequence with transfer syntax LittleEndianImplicit (CP-246)
W: Found element (2005,140f) with VR UN and undefined length, reading a sequence with transfer syntax LittleEndianImplicit (CP-246)

Attached file dcmtk_dcmdump-out.txt contains output from dcmtk's dcmdump where I have redacted a couple of fields. dicom_tags_from_dcmdump.txt could_not_read_data_set_token-backtrace.txt

troelsarvin avatar Sep 06 '21 07:09 troelsarvin

Thank you for reporting. I managed to track down the root of the problem: according to section 6.2.2 of chapter 5, data sets inside sequences with the value representation UN are expected to be encoded in implicit VR little endian, regardless of the current transfer syntax. This is an exception to the rule defined in section 7.5, one which is not being considered at the moment.

The way to overcome this that seems intuitive to me right now would be to incorporate additional logic to handle this case in the dicom-parser crate. From the moment it enters a sequence of VR UN, it would remember to always use implicit VR little endian until it reaches the end of that sequence. This could be done either in the lower layer (stateful decoder and encoder) or in the layer above that one (data set reader/writer).

If any additional guidance needed towards resolving this, just let me know.

Enet4 avatar Sep 06 '21 08:09 Enet4

Meanwhile, I managed to hex-edit the study into an anonymized form which still makes dirom-rs choke. The anonymized study will exist at https://troels.arvin.dk/rust/dicom/anon-could_not_read_data_set_token.dcm for a while.

Is there any way to force dicom-rs to parse in a "best-effort" way, so one can at least extract SIUID and SOP instance UID from the DICOM file?

troelsarvin avatar Sep 06 '21 08:09 troelsarvin

Is there any way to force dicom-rs to parse in a "best-effort" way, so one can at least extract SIUID and SOP instance UID from the DICOM file?

There currently isn't, but the idea of introducing options to the object loading process has come up in my head before, and I agree that it would be useful.

Enet4 avatar Sep 06 '21 09:09 Enet4