deid icon indicating copy to clipboard operation
deid copied to clipboard

Support for "Retain Safe Private Option"

Open lassoan opened this issue 6 months ago • 23 comments

We are interested in preserving private fields that are known to be safe to keep - as described in https://dicom.nema.org/medical/dicom/current/output/html/part15.html#sect_E.3.10

Is there a way to specify private fields with a private creator (e.g., (7053,xx00) | Philips PET Private Group) in recipes?

lassoan avatar Jul 09 '25 21:07 lassoan

I can give a suggestion for a strategy to take - if you have very specific, scoped fields you want to keep, you can use the KEEP directive. If it's more a global thing a strategy akin to the reverse remove_private_tags might make sense.

https://github.com/pydicom/pydicom/blob/893609e1763044032ebee83d9fe563d8c9525012/src/pydicom/dataset.py#L2544

Can you describe the operation you are doing with deid? Is it the case that you want to retain a subset of private fields, or private fields globally?

vsoch avatar Jul 12 '25 15:07 vsoch

Thanks for the reply. As a first step, we would like to implement the Retain Safe Private Option specified in the DICOM standard. That's a relatively short list of attributes (less than 500) that can all be kept unchanged - no need to remove them.

It would be nicer to perform deidentification actions that TCIA specified. Their list is longer, containing 8000+ private attributes, and there 6 possible actions (see TCIA deidentification overview page and the Private Tag Dictionary).

lassoan avatar Jul 12 '25 21:07 lassoan

That sounds good to me. One question - is this something that should be implemented in upstream pydicom/pydicom? Akin to remove private tags there might be a second function to achieve this (that we could then use here).

vsoch avatar Jul 13 '25 13:07 vsoch

Selective removal of private tags, especially from sequences, is quite a complex topic in itself and it is quite specific to deanonymization. So, pydicom does not sound like an ideal place for it. Also, full-featured deanonymizers that need to go beyond DICOM "Retain Safe Private Option" would not use this function anyway, because they need to implement more sophisticated logic for deciding what tags to remove/alter and exactly how (delete, keep, empty, generate UID, offset time).

Do you think deid could implement TCIA-like private tag deidentification without major design change? For example, would it be possible to convert the actions described in Private Tag Dictionary to something that deid can execute?

lassoan avatar Jul 13 '25 13:07 lassoan

My suggestion would be to turn the tags.py module into a directory proper, e.g., tags.py to tags/__init__.py and then have a private.py with that explicit listing and function.

vsoch avatar Jul 13 '25 13:07 vsoch

What do you mean by "explicit listing and function"? Would we specify some tags in Python code; or would private.py get the information from some configuration files?

Do we actually need a private dictionary? Could we simply add private tags to the recipe, specified by group, element, private creator? For example:

KEEP (GEMS_PETD_01,0009xxBF)

It would be nice to also include a field description (would help debugging and logging). It would be also useful to be able to include the VR, because the TCIA listing often contains variant of tags with different VRs (but not the VM, as it is not specified in the TCIA listing). For example:

KEEP (GEMS_PETD_01,0009xxBE,SL,AC BP Filter)
KEEP (GEMS_PETD_01,0009xxBE,SS,AC BP Filter)
REMOVE (GEMS_PETD_01,0009xxBE,UN,AC BP Filter)
KEEP (GEMS_PETD_01,0009xxC1,SL,AC Img Smooth)
KEEP (GEMS_PETD_01,0009xxC1,SS,AC Img Smooth)
REMOVE (GEMS_PETD_01,0009xxC1,UN,AC Img Smooth)

lassoan avatar Jul 14 '25 17:07 lassoan

If you want to just create a deid recipe to keep the specific tags, no development is needed here. The first example you provided would be a way to achieve that, and it would be logical to contribute the list to the repository if others might find it useful. What I was thinking is (given that the list is stable and users would want a flag to add to the cleaning functions to do this task) we could add the logic to the library. I'm thinking again on that, and perhaps your method to create the recipe is more logical.

vsoch avatar Jul 14 '25 23:07 vsoch

What would be needed to be able to specify the tags similarly to my examples above?

lassoan avatar Jul 15 '25 00:07 lassoan

I've never used deid for private tags like that - going to ask @wetzelj for help on that one.

vsoch avatar Jul 15 '25 00:07 vsoch

Currently we don't evaluate the Private Creator - I haven't had to do anything like this - or tested what I'm about to describe, but looking at fields.py - name_contains() and the fact that this does regular expression matching on the tag (group/item) in addition to the name, I suspect that rules could be written like:

KEEP 0009..BF

Admittedly, this isn't going to limit only to the GEMS_PETD_01 private creator, and doesn't handle the situation you described where you want to keep only certain VRs, but if these limitations aren't deal-breakers, might be something that could be easily tried.

To accommodate something like you described:

KEEP (GEMS_PETD_01,0009xxBF)

or the more advanced variant that also looks at the VR

KEEP (GEMS_PETD_01,0009xxBE,SL,AC BP Filter)
KEEP (GEMS_PETD_01,0009xxBE,SS,AC BP Filter)
REMOVE (GEMS_PETD_01,0009xxBE,UN,AC BP Filter)

we'd have to expand the methods used to get the list of targeted fields (again, fields.py).

I then wonder if that's over-complicating the tag selection processing? Would it perhaps be better to figure out a way to do this with a deid-defined function? In theory, with this type of a solution we could develop a function which interrogated the VR and private creator for the identified tag - and act on the tag only if those conditions were satisfied. In this theoretical solution, the recipe commands would look something like:

KEEP 0009..BE func:check_conditions vr=SL creator=GEMS_PETD_01
KEEP 0009..BE func:check_conditions vr=SS creator=GEMS_PETD_01
REMOVE 0009..BE func:check_conditions creator=GEMS_PETD_01

Full disclosure, I just threw this out as an idea. I don't know that I like it. :)

wetzelj avatar Jul 16 '25 02:07 wetzelj

Thank you for the suggestions. I like the idea of keeping the private creator in the tag name and use additional functions for more advanced checks.

Could something like these work?

KEEP (GEMS_PETD_01,0009xxBF)
KEEP (GEMS_PETD_01,0009xxBF) func:check_conditions vr=SL
KEEP (GEMS_IDEN_01,0009xx04) func:check_conditions vr=LO|OB|SH|UN

How could this be used for REPLACE? There the custom function is used for specifying the value. How could the custom function for including/excluding a field could be combined with custom function for specifying the value? For example, right now I don't think this would work:

REPLACE (GEMS_IDEN_01,0009xx04) func:check_conditions vr=LO|OB|SH|UN func:change_something a=2 b=5

Maybe it should be done in two steps? (it would not be the most convenient, but probably workable)

%fields field_set2_private
FIELD (GEMS_IDEN_01,0009xx04) func:check_conditions vr=LO|OB|SH|UN

%header
REPLACE fields:field_set2_private func:change_something a=2 b=5

lassoan avatar Jul 16 '25 17:07 lassoan

You'd want to put logic in the custom function, which will receive the item being parsed, the value, field and the entire dicom. A development tip is to write it first like this so you get an interactive console:

def generate_uid(item, value, field, dicom):
    '''This function will generate a dicom uid! You can expect it to be passed
       the dictionary of items extracted from the dicom (and your function)
       and variables, the original value (func:generate_uid) and the field
       object you are applying it to.
    '''
    import IPython
    IPython.embed()

You'd want to write all of your logic in that function based on those inputs received.

vsoch avatar Jul 16 '25 21:07 vsoch

Thanks for the debugging tip.

About the action syntax, would it be possible to specify custom rule for both the field selection (e.g., extra check on the VR) and custom rule for getting the replacement value?

Do you have a preference for the syntax of specifying private creator string? The private creator string can contain spaces, comma, parenthesis, etc. (see DICOM standard). The (0009,"ACUSON",03) format is used by DCMTK and TCIA private dictionary, so if there is no other suggestion then we could use it.

lassoan avatar Jul 16 '25 23:07 lassoan

Could you just not look at the VR in your custom function? The entire DICOM is passed in.

vsoch avatar Jul 17 '25 00:07 vsoch

I'd like to take a step back just to clarify that I understand all of the potential enhancements that we're discussing - I think there are some really great enhancements here - but it may make sense to implement them differently.

  1. Enable tag selection using private creator. Current pydicom/deid functionality enables tag selection within the recipe using pattern matching on both the tag name and the group/tag number. The private creator is not considered. Expand field selection to optionally incorporate Private creator - enabling recipe rules such as:
REPLACE (0009, "ACUSON", 03) 123456
KEEP (0009, "ACUSON", 04)      #Keeps ACUSON tags 000904xx
REMOVE (0009, "ACUSON", 05)    #Removes all ACUSON Tags 000905xx
REMOVE (0009, "ACUSON", 0610)  #Removes ACUSON Tag 00090610
  • Additional thoughts:
    • I think the format you proposed that is used by DCMTK and TCIA would be fine. I don't see a need to create a new format
    • It will be important to ensure that new functionality for tag selection using this pattern doesn't break the current pattern matching methodologies.
    • The more I think about this, the more I like it. :) I haven't had the need to do this, but I could definitely see it becoming necessary in the future. It seems like it would be a very good general-use enhancement to pydicom/deid
  1. Apply VR filtering to tag selection.
  • Thoughts:
    • Initially my mind went to using a custom function for this, but if we move forward with enhancement 1, maybe it makes more sense to incorporate this into the tuple used to identify the tags. The syntax used by DCMTK is:
      # The tag value may take one of two forms:
      #   (gggg,"CREATOR",ee)
      #   (gggg,"CREATOR",eeee) [eeee >= 1000]
      
      If we're implementing this, it seems like a minimal additional lift to add a fourth (optional) element to the tag identifying tuple that would be the list of included VRs.
      #  (gggg,"CREATOR",ee,[VR])
      #  (gggg,"CREATOR",eeee,[VR])
      
  1. Create a TCIA recipe. Create a recipe based on the actions taken by the TCIA deidentification script - defined in Table 1 on this page.

wetzelj avatar Jul 17 '25 13:07 wetzelj

I experimented with this a bit and it was actually very easy to allow specifying private tags by private creator - see here:

https://github.com/lassoan/deid/commit/2fd2027208c5fb88502ad1a30b9ad17ac774d7d4

I've tried to remove all private tags by using REMOVE with a custom is_private function but keep ones that are explicitly specified as KEEP.

def is_private(dicom, value, field, item):
    return field.element.is_private and (field.element.private_creator is not None)

This experiment failed: the private field that I specified to keep ended up being removed. The "remove private attributes" flag was disabled, and if I did not include REMOVE ALL func:is_private then the private field was kept. According to the documentation it seemed to me that specifying both REMOVE and KEEP for an attribute results in preserving the attribute, so I'm not sure if the documentation was inaccurate or there is some other problem.

lassoan avatar Jul 17 '25 23:07 lassoan

I would interact through python and put an IPython.embed() here and walk through the logic to understand what is going on (that is what I would try).

https://github.com/pydicom/deid/blob/fa4731f79d7002d51e31cbec6f2586d87afb479d/deid/dicom/parser.py#L205

Also remember that order matters - if you remove first there is nothing left to keep.

vsoch avatar Jul 17 '25 23:07 vsoch

@vsoch, the experiment from @lassoan leads me to believe that there is perhaps a discrepancy in functionality. In one of our prior PRs, I specifically tested the interactions of multiple actions on the same fields, documented the results of those interactions, and added unit tests to ensure that future changes maintain those interactions. While order does matter, KEEP is supposed to take precedence over REMOVE - regardless of order.

While I added unit tests which validate the interactions, I did not write unit tests (or thoroughly validate) the interactions when using custom functions and how they interact. I suspect that what @lassoan experienced stems from some difference in how custom functions are applied to the header compared to how direct actions are applied. I suspect that the remove as a result of a function (REMOVE ALL func:is_private) is applied differently than a direct field reference remove (REMOVE PatientID) and therefore not following the same interaction patterns.

@lassoan, can you please re-run your test, but with the following commands:

REMOVE (1129,"Eigen, Inc",08)
KEEP (1129,"Eigen, Inc",08)

If my suspicion is correct, the KEEP will override the REMOVE...

wetzelj avatar Jul 18 '25 03:07 wetzelj

KEEP+REMOVE preserves a standard tag, but removes a private tag. It seems that REMOVE, ADD, REPLACE actions work the same way for standard and private tags, but KEEP does not appear to have any effect on private tags. @Simlomb will help with investigate this further and will provide more details soon.

lassoan avatar Jul 18 '25 14:07 lassoan

@lassoan @vsoch @wetzelj I added some tests to investigate the issue (https://github.com/Simlomb/deid/commit/19827c5b96f4d38fde36d4ae7f262556e4d54ea9). I also tested it on your latest master (https://github.com/pydicom/deid/commit/fa4731f79d7002d51e31cbec6f2586d87afb479d) and the findings are the same. Here my findings on this investigation:

  1. Running the test test_keep_remove_private_tags_should_be_original_value with the field variable specified with this formatting 00110001, produced a failure.
  2. Running the test test_keep_remove_standard_tags_should_be_original_value with the field variable specified with this formatting 00100010, produced a failure. I also noticed that if I used the format '0x00100010' , accepted in pydicom, for the tag, this makes the tests a success, because it is not actually considered by deid and doesn't perform any action on the tag. However it doesn't produce any error on the "wrong formatting".
  3. Running the test test_keep_remove_standard_tags_from_configfile_should_be_original_value where I load a test deid recipe from a config file with the KEEP/REMOVE action on a standard tag, produced a failure. For this specific case I made several attempts by modifying the format of the tag in the recipe, the only format that worked was, e.g., KEEP 00100010. The KEEP/REMOVE or REMOVE/KEEP logic failed. The KEEP/REMOVE logic worked when instead of the tag number I used the "PatientName" field (as demonstrated by the already existing tests). Regardeless of private or standard tags the KEEP/REMOVE logic fails when the field is a tag. Do you have any guess on what could be causing this or how to fix it?

I was also thinking on a more general modification to handle the private tag. What do you think about keeping private tags if remove_private=True but there is a KEEP rule for a private tag in the deid recipe? In this case the code would first check if there are KEEP actions for private tags in the recipe and then remove all the other private tags that are not found in the recipe.

Simlomb avatar Jul 19 '25 11:07 Simlomb

My suspicion is that we aren't including the private tags in the higher level item that we parse from, and private tags are handled holistically - e.g., removed all at once here. I would try putting an IPython embed in this function and having it trigger on a private tag name. I would suspect either the tag won't be in the list of items, or referenced in the wrong way.

I was also thinking on a more general modification to handle the private tag. What do you think about keeping private tags if remove_private=True but there is a KEEP rule for a private tag in the deid recipe? In this case the code would first check if there are KEEP actions for private tags in the recipe and then remove all the other private tags that are not found in the recipe.

A KEEP rule should preserve a field, I agree, including private tags. In this function we try to remove using the dicom (pydicom) upstream function and fall back to a manual deletion, however if we can't filter with the first means we might want to stick to the default, but with the customization you described above.

vsoch avatar Jul 19 '25 19:07 vsoch

My suspicion is that we aren't including the private tags in the higher level item that we parse from, and private tags are handled holistically - e.g., removed all at once here. I would try putting an IPython embed in this function and having it trigger on a private tag name. I would suspect either the tag won't be in the list of items, or referenced in the wrong way.

I added more tests to dig deeper (https://github.com/Simlomb/deid/commit/ed9ca8a79621f872e747cfd2653b6b72917d119f):

  1. The KEEP/REMOVE logic fails (in test_keep_remove_standard_tags_should_be_original_value) even with the standard tags specified in the recipe with the format:
KEEP 00100010
REMOVE 00100010

So it seems to be a more general problem on how the action interaction is handled when the field is a tag (regardless of standard or private). Do you have any guess on what could be causing this or how to fix it?

  1. As it fails for the standard case, it fails also for the private tags case (in test_keep_remove_private_tags_should_be_original_value) with the new format introduced in the commit
KEEP (1129,"Eigen, Inc",08)
REMOVE (1129,"Eigen, Inc",08)
  1. I tested the different ways that tags can be specified in the recipe and accepted by deid, and found that the formats like REMOVE 00100010, or REMOVE PatientName, pass the test, but the format REMOVE (0010,0010) fails. The private tags format also passes the tests with the new format e.g., REMOVE (1129,"Eigen, Inc",08)). These tests are at https://github.com/Simlomb/deid/blob/private-creator-experiments/deid/tests/test_remove_action.py

A KEEP rule should preserve a field, I agree, including private tags. In this function we try to remove using the dicom (pydicom) upstream function and fall back to a manual deletion, however if we can't filter with the first means we might want to stick to the default, but with the customization you described above.

Ok I guess we agree that keeping private tags if remove_private=True but there is a KEEP rule for a private tag in the deid recipe could be a good approach.

Simlomb avatar Jul 20 '25 18:07 Simlomb

Do you have any guess on what could be causing this or how to fix it?

To get these insights you need to interactively look at what the code is doing. I suggest:

print("Identify location here")
import IPython
IPython.embed()

at different points in the execution, and look at the variables and test functionality. Your tests are very good but won't give these "why" insights. I don't have a good sense of why things are happening unless I do that.

vsoch avatar Jul 20 '25 19:07 vsoch