dicom-anon icon indicating copy to clipboard operation
dicom-anon copied to clipboard

Support Pixel Anonimizers

Open cancan101 opened this issue 9 years ago • 12 comments

Allow plugging in a pixel anonymizer that blacks our the burned in annotations. Ideally it would plug in here and look something like: https://github.com/johnperry/CTP/blob/master/source/files/scripts/DicomPixelAnonymizer.script

cancan101 avatar Mar 02 '15 22:03 cancan101

Hi, thanks for the suggestion. I don't have much experience with CTP, but I agree an option to plugin a preferred pixel anonymizer would be a nice feature. I think the option could go here right before it cleans out the headers so that we don't destroy data the pixel cleaner needs. Do you have any experience with scripts to do this?

Are you currently using the dicom-anon script?

jeffmax avatar Mar 03 '15 18:03 jeffmax

I am looking to use it. Currently I have a Matlab script that does the anonimization, but I would prefer to move to Python. In my matlab script I blank out the burned in annotations.

Another Python implementation I found is: https://github.com/darcymason/pydicom/blob/dev/pydicom/examples/anonymize.py

cancan101 avatar Mar 03 '15 18:03 cancan101

You also want to remove the burned annotation and then set burned in to false so that the file does not get quarantined.

cancan101 avatar Mar 03 '15 18:03 cancan101

I have used that- I basically wrote this script to be a more extensive version of that one.

On Tue, Mar 3, 2015 at 1:12 PM, Alex Rothberg [email protected] wrote:

I am looking to use it. Currently I have a Matlab script that does the anonimization http://www.mathworks.com/help/images/ref/dicomanon.html, but I would prefer to move to Python. In my matlab script I blank out the burned in annotations.

Another Python implementation I found is: https://github.com/darcymason/pydicom/blob/dev/pydicom/examples/anonymize.py

— Reply to this email directly or view it on GitHub https://github.com/chop-dbhi/dicom-anon/issues/3#issuecomment-77001228.

jeffmax avatar Mar 03 '15 18:03 jeffmax

Good Point! I probably won't have time to properly dig into writing a pixel anonymizer in the near-term, but if you have something in MATLAB you would like convert to Python and contribute to the project we welcome any pull requests. I think the hard part is all the heuristics for identifying likely burnt-in data (and making that extendible), which you might already have (and it looks like the CTP script has a good start as well).

It has always been on my wish list to try to use some simple machine learning or OCR to look for text, or at least alert above a certain confidence.

jeffmax avatar Mar 03 '15 18:03 jeffmax

I'd certainly be interested in helping integrate something if you contributed.

jeffmax avatar Mar 03 '15 18:03 jeffmax

It looks like OB and OW VRs are being removed here: https://github.com/chop-dbhi/dicom-anon/blob/4a6f06887459e72fb07ba17c28ad2fa4747c74e0/dicom_anon.py#L551 which is the VR set on pixel data. This means the entire pixel data seems to be removed when "anonymizing".

cancan101 avatar Mar 10 '15 16:03 cancan101

So you gave me a heart attack on this one, but have tried it and seen it delete the pixel data? I think because of this line in pydicom

https://github.com/darcymason/pydicom/blob/master/source/dicom/_dicom_dict.py#L3706

it actually sets that VR string to "OB or OW" and it fails to match. Assuming this is preventing the problem for you, this is definitely not something it should rely on.

jeffmax avatar Mar 10 '15 17:03 jeffmax

I'm not sure I follow what you are saying.

It looks like the VR string as presented by pydicom may be: 'OB or OW', 'OB' or 'OW'.

I have dealt with the issue for now:

def vr_handler(ds, e):
    if (e.VR in ['PN', 'CS', 'UI', 'DA', 'DT', 'LT', 'UN', 'UT', 'ST', 'AE', 'LO', 'TM', 'SH', 'AS', 'OB', 'OW'] and
        e.tag != PIXEL_DATA):
        del ds[e.tag]
        return True
    return False

cancan101 avatar Mar 10 '15 17:03 cancan101

Have you seen a situation where pydicom actually puts in the e.VR for the pixel data element the string "OW" or the string "OB"?

My question is that it looks like from file I linked to that PyDICOM sets that string to "OB or OW" so it won't match.

Here is an example from ipython examining a dicom file:

a[0x7fe0, 0x0010].VR
'OW or OB'

jeffmax avatar Mar 10 '15 17:03 jeffmax

Definitely:

In [388]: ds = dicom.read_file("/Users/alex/Downloads/series (1).dcm")
ds[0x7fe0, 0x0010].VR

Out[388]: 'OB'

and after running the file through dcmdjpeg I see OW.

cancan101 avatar Mar 10 '15 17:03 cancan101

Look at that. Thanks for catching that.

jeffmax avatar Mar 10 '15 17:03 jeffmax