mne-bids
mne-bids copied to clipboard
When anonymizing, allow to remove participant age too
Converting a small-N study from @SophieHerbst to BIDS and passing anonymize
to write_raw_bids()
, we found that participants.tsv
still contained the participants' ages. This is probably intentionally so, but it can cause trouble in studies with small numbers of participants, where age could be used to allow for a post-hoc association with a particular person.
Therefore, anonymization should optionally drop age from participants.tsv
as well. I'm not sure about the API though, as currently, write_raw_bids()
's anonymize
parameter accepts a dictionary that is then directly passed to Raw.anonymize()
. I wonder if we could add an additional dictionary key, keep_age=False
, in alignment with keep_his=False
that we currently have.
if we remove age from fif files we should remove it from participants.tsv
we should be consistent. See keep_his parameter
The FIF only stores the date of birth, if I'm not mistaken
Yes but to me if we remove date of birth we should remove age from participants.tsv
Does it make sense?
@agramfort
Yes but to me if we remove date of birth we should remove age from participants.tsv
Yes and no.
I believe it was a conscious decision not to remove the age, because age is sometimes required / extremely useful for certain analyses – even if all other personal identifying information (PII) has been dropped from the data.
Imagine the research question of "brain age" vs "calendar age" that @dengemann is working on. Here, it could be imperative to retain participants' age, even when sharing otherwise anonymized data.
Therefore I believe we should allow participants to anonymize the data while retaining age, even though the date of birth gets removed.
Thoughts?
ping @sappelhoff
I think it should be optional. The question is if someone can easily match the participant with an external description file. If the goal is to anonymize the data, age must be of course removed.
How would you imagine the workflow to be?
Say you've measured 50 participants, and calendar age plays a central role in the analysis you published in Nature. Of course you want to make the data publicly available. So you anonymize it – including removal of age. But now no-one will be able to replicate your published analysis anymore, because the important variable "age" is now missing. Are they supposed to get in touch with you and ask for the age associated with each participant ID?
maybe add just a new function in mne_bids to a posteriori remove age?
or as anonymize can be a dict we can add a valid key to say if age should be written
anonymize=dict(write_age=False)
or as anonymize can be a dict we can add a valid key to say if age should be written
that sounds good to me, re-using our existing API, adding one more option with a sensible default (remove age)
Ok. Current API is:
anonymize : dict | None
If `None` (default), no anonymization is performed.
If a dictionary, data will be anonymized depending on the dictionary
keys: ``daysback`` is a required key, ``keep_his`` is optional.
``daysback`` : int
Number of days by which to move back the recording date in time.
In studies with multiple subjects the relative recording date
differences between subjects can be kept by using the same number
of ``daysback`` for all subject anonymizations. ``daysback`` should
be great enough to shift the date prior to 1925 to conform with
BIDS anonymization rules.
``keep_his`` : bool
If ``False`` (default), all subject information next to the
recording date will be overwritten as well. If True, keep subject
information apart from the recording date.
My proposal:
anonymize : dict | None
If `None` (default), no anonymization is performed.
If a dictionary, data will be anonymized depending on the dictionary
keys: ``daysback`` is a required key, ``keep_his`` and ``keep_age`` are optional.
...
``keep_age`` : bool
Whether to retain age information even when ``keep_his=False``. This can be used
to remove the date of birth and all other personal identifying information from the data,
while still keeping the age in ``participants.tsv``. If ``False`` (default), remove age when
``keep_his=False``. If ``True``, retain age.
LGTM. just a small voice in my head whether we'll have a discussion about "remove everything EXCEPT <insert some other HIS aspect here>
", so whether keep_age=True
should be a keep=["age", "...."]
.
Or does that fall under YAGNI? :)
both would work for me.
+0.5 on keep_age key
so whether
keep_age=True
should be akeep=["age", "...."]
.
I imagine this being a little annoying for users:
write_raw_bids(..., anonymize=dict(daysback=123, keep=['age']))
seems a little complex for what we're trying to do here
Just one fly-by comment. I think it's more common to keep the age rather than not keep it. So, I would name the argument drop
instead of keep
so most users don't have to specify it
So, I would name the argument
drop
instead ofkeep
so most users don't have to specify it
But MNE has this keep_his
thing, so I'd like to call the param keep_*
for consistency. We can default it to True
, though!
But MNE hat this
keep_his
thing, so I'd like to call the paramkeep_*
for consistency. We can default it toTrue
, though!
Unless we change the MNE-BIDS API:
anonymize = dict(daysback=123, drop_pii=True, drop_age=False)
(no-one knows what his
means, do they???)
In fact, we wouldn't even need drop_pii
or keep_his
, because if a user requests to anonymize, of course they want to remove personal itentifying info too, no? scratches head
Okay yes indeed, we can make the default True
so user doesn't have to specify it! The anonymize
dict was made so that you could pass it to mne.anonymize_info
.
Regarding his
, see here: https://github.com/mne-tools/mne-matlab/blob/master/matlab/fiff_define_constants.m#L241
also see this: https://mne-cpp.github.io/pages/documentation/anonymize.html
I've been thinking about this and I'd like to change our anonymization-related API.
Currently, we have this anonymize
parameter in write_raw_bids()
, which is supposed to be a dictionary whose key-value pairs will be passed to Raw.anonymize()
.
I don't think this is very intuitive for several reasons:
- If I read an imperative verb like
anonymize
, I'd expect a boolean –True
to anonymize,False
to not anonymize - If I want to anonymize the data, I want the data to be … anonymized. I don't want to and shouldn't need to think about this
keep_his
thing – it should always beFalse
, as I see no reason for it not to beFalse
if a user wants to anonymize their data - Considering that
keep_his
is superfluous, sticking with the currentwrite_raw_bids()
signature would leave us withanonymize=dict(daysback=123)
– not great. I'd prefer to have a separate parameter to specify the "days back", e.g.,anonymize_daysback
Now that I have typed this out, I'm thinking whether we could simply drop anonymize
and add anonymize_daysback: None | int
. If None
, don't anonymize. And if we do anonymize, also erase the his
and the participant's age.
WDYT?
I thought about this some more and I think I've changed my mind and would like to keep anonymize
as a dict, but it should accept the following keys:
-
daysback: int
(like it does already) -
age: bool = True
(control whether age should be dropped or not)
WDYT?
yes for camcan you anon but you keep age. you think keep_his is too cryptic? how about gender?
yes for camcan you anon but you keep age. you think keep_his is too cryptic? how about gender?
Not sure I understand your question – keep_his
refers to an ID from the hospital information system (if I'm not mistaken), and MNE can remove the his_id
from the info, which I assume we should always do when anonymizing.
Gender/sex is a good point, and reminded me that even handedness might be an issue for small-N studies. So my proposal:
anonymize = dict(daysback: int, age: bool = True, sex: bool = True, hand: bool = True)
Thoughts?
I am just a bit worried to deviate from mne-python API
I am just a bit worried to deviate from mne-python API
Yes, but we'll have to do that anyway, as we want to optionally allow to keep age, handedness, sex, …
Just chiming in here with a random question I had related to mne-bids anonymizing: is it common to require age, handedness and sex to be scrubbed to be "anonymized"? I know in the USA that's not considered PHI (vs birthdate, recording date).
Is the reason to just provide an extra degree of anonymity?
maybe add just a new function in mne_bids to a posteriori remove age?
I like the a-posteriori idea too FWIW because sometimes you write stuff to BIDS and then want to add this extra layer of anonymization, and end up either having to do it manually, or rewrite.
Is the reason to just provide an extra degree of anonymity?
Imagine a small-N study you conduct among your colleagues, and maybe only one of them is left-handed or one is much younger or older than the others... then those data could easily be used to deanonymize things.