presidio
presidio copied to clipboard
Recompress compressed DICOM images after redaction
Describe the bug
When running redaction on compressed pixel data, the returned pixel data is uncompressed. This is because when adding boxes via DicomImageRedactorEngine._add_redact_box
, we use the loaded DICOM instance's .pixel_array
values, which is uncompressed, unlike its .PixelData
.
We are still able to redact correctly, but we are then unable to save the redacted instance as a .dcm
file.
Side note: If an error occurs while trying to write out the pixel data post-redaction, then gdcm may need to be installed.
Whether the pixel data is compressed or not can be checked via the DICOM tag (0028, 2110). If the value is '01', then the pixel data is compressed.
if redacted_instance[0x0028, 0x2110].value == '01':
compression_method = instance.file_meta.TransferSyntaxUID
print(f'Pixel data is compressed with Transfer Syntax UID: {compression_method}')
To Reproduce Steps to reproduce the behavior:
import pydicom
from presidio_image_redactor import DicomImageRedactorEngine
# Redact text PHI
engine = DicomImageRedactorEngine()
instance = pydicom.dcmread(PATH_TO_DICOM_FILE)
redacted_instance = engine.redact(instance)
# Calculate bytes
rows = instance[0x0028, 0x0010].value
columns = instance[0x0028, 0x0011].value
samples_per_pixel = instance[0x0028, 0x0002].value
bits_allocated = instance[0x0028, 0x0100].value
try:
number_of_frames = instance[0x0028, 0x0008].value
except:
number_of_frames = 1
expected_num_bytes = rows * columns * number_of_frames * samples_per_pixel * (bits_allocated/8)
print(f"Expected (no compression): {int(expected_num_bytes)}")
print(f"Actual, pre-redaction: {len(instance[0x7fe0, 0x0010].value)}")
print(f"Actual, post-redaction: {len(redacted_instance[0x7fe0, 0x0010].value)}")
Note that native support for compressing is not implemented in pydicom yet. The following line would be ideal but throws an error due to it not being available.
redacted_instance.compress(transfer_syntax_uid=compression_method, encoding_plugin='gdcm')
Expected behavior With the above, we would ideally have the number of bytes pre- and post-redaction as equal. But when no compression is re-applied to previously compressed pixel data, then the number of bytes for post-redaction would be equal to what is expected with no compression.
If we run redacted_instance.save_as('FILE_NAME_HERE.dcm')
, then we get the following error (which we want to avoid):
ValueError: With tag (7fe0, 0010) got exception: (7FE0,0010) Pixel Data has an undefined length indicating that it's compressed, but the data isn't encapsulated as required. See pydicom.encaps.encapsulate() for more information
Traceback (most recent call last):
File "/anaconda/envs/feasibility-study/lib/python3.8/site-packages/pydicom/tag.py", line 28, in tag_in_exception
yield
File "/anaconda/envs/feasibility-study/lib/python3.8/site-packages/pydicom/filewriter.py", line 662, in write_dataset
write_data_element(fp, dataset.get_item(tag), dataset_encoding)
File "/anaconda/envs/feasibility-study/lib/python3.8/site-packages/pydicom/filewriter.py", line 579, in write_data_element
raise ValueError(
ValueError: (7FE0,0010) Pixel Data has an undefined length indicating that it's compressed, but the data isn't encapsulated as required. See pydicom.encaps.encapsulate() for more information
Additional context Potentially helpful resources:
- https://pydicom.github.io/pydicom/dev/old/working_with_pixel_data.html
- https://pydicom.github.io/pydicom/dev/old/image_data_handlers.html
- https://pydicom.github.io/pydicom/dev/tutorials/pixel_data/compressing.html
- https://pydicom.github.io/pydicom/dev/reference/generated/pydicom.dataset.Dataset.html#pydicom.dataset.Dataset.compress
- https://pydicom.github.io/pydicom/dev/old/image_data_compression.html
- https://pydicom.github.io/pydicom/dev/reference/generated/pydicom.pixel_data_handlers.gdcm_handler.html
- https://stackoverflow.com/questions/58518357/how-to-create-jpeg-compressed-dicom-dataset-using-pydicom
- http://dicomiseasy.blogspot.com/2012/08/chapter-12-pixel-data.html