dicom-pipe icon indicating copy to clipboard operation
dicom-pipe copied to clipboard

Support Multiple Character Repertoires

Open neandrake opened this issue 5 years ago • 0 comments

Requirement My understanding of the DICOM standard: The SpecificCharacterSet tag supports listing multiple character sets in use for element values. I believe the way to interpret this is that the first character set should be used to decode string values and if encountering a specific byte/code sequence to see that as an indicator that the following bytes are encoded by the next character set.

This is a bit crazy, and to my knowledge is not terribly common. Let's support it!

I believe relevant sections from the DICOM Standard are: Part 5, Chapter 6.1.2.30

If Attribute Specific Character Set (0008,0005) has multiple values, the DICOM SOP Instance supports Code Extension techniques as described in ISO/IEC 2022:1994 Unfortunately the ISO 2022:1994 standard appears to cost upwards of $100. Referring to other implementations might be best.

See also: Part 5, Chapter 6.1.2.5.4.c

Design The current implementation of characters sets relies on the encoding crate. The EncodingRef type is aliased to CSRef used throughout the parsing/API. I think the general structure of changes here would be

  1. Create a concrete CharSets type (or similar, better naming preferred). This would look roughly like
pub struct CharSets {
  charsets: Vec<EncodingRef>,
}
  1. Implement the Encoding trait for CharSets. If there's just a single item in charsets then it should just defer all encoding to this item. If there are multiple then we will likely need to do some scanning for the marker byte to know the boundaries for when to break up into decoding with the different character sets.
  2. Update all of parsing to use CharSets instead of CSRef
  3. Re-enable and update these unit tests for thorough verification that parsing works properly:
  • charsets::test_scs_h32
  • parsing::test_dicomdir_with_std
  • parsing::test_dicomdir_withoutstd

neandrake avatar Jun 05 '20 21:06 neandrake