bids-specification icon indicating copy to clipboard operation
bids-specification copied to clipboard

Age at session

Open ghisvail opened this issue 2 years ago • 10 comments

More of a question first, which may lead to a proposal later.

Is there a consensus on how to represent the concept of an age at session in the BIDS metadata hierarchy?

The context concerns longitudinal studies where imaging visits span a large period of time (several years typically), and the age at session is necessary for modeling disease progression.

On first read, I thought this attribute may go in sub-<label>_sessions.tsv#age which would override participants.tsv#age as per the inheritance rule. But that contradicts the rule in the sessions file section mandating the use of distinct attributes between participants and sessions tabular data.

A BIDS-compliant alternative could be to introduce an age_at_session attribute in the sessions file.

What are your thoughts?

ghisvail avatar Mar 02 '22 11:03 ghisvail

Quick suggestion that is suboptimal regarding data anonymisation:

  • have the year of birth in the participants.tsv
  • use the acquisition time info in the scans.tsv or sessions.tsv to compute the age

Remi-Gau avatar Mar 02 '22 12:03 Remi-Gau

in general, this is true not just of age but any column appearing in the phenotype directory as well that replicates elsewhere. age and sex and other demographics are often part of many common assessments. hence if there are multiple assessments of a person at different sessions, currently there is no allowance for either participant specific phenotypic information to exist inside a sessions directory.

for this specific age problem there are a couple of options:

  1. age is recommended and not required in the participants tsv file. so if indeed age changes from session to session, one can use the sessions tsv file to reflect the age and not put it in the participants.tsv.

  2. one can also include a variable in participants.tsv (age_at_first_session), and then have temporal offsets of session times in the sessions.tsv.

satra avatar Mar 02 '22 12:03 satra

@Remi-Gau @satra thank you both for your suggestions.

Here is my attempt at summarizing your proposals in my own words and their pros and cons:

  1. Store year_of_birth in participants metadata and acq_time in sessions or scans metadata. Age at session cannot be queried directly, but is computed subsequently from both attributes.

  2. Store age in the session metadata level and drop age from the subject level. Age at session can be queried directly as a result, but age (at recruitment) requires access to the session-level metadata and knowledge as to which session_id corresponds to the baseline.

  3. Store age at baseline in subject-level metadata and temporal offsets from baseline in session-level. Imo, this proposal shares the same drawbacks as in 1, whilst being easier to interpret since age is explicit.

Please correct me if I am wrong.

On a side note, what's the rationales for forbidding inheritance for tabular metadata but allowing it for JSON metadata? I can't help but think that inheritance would have solved this in an elegant way.

ghisvail avatar Mar 02 '22 13:03 ghisvail

  • have the year of birth in the participants.tsv
  • use the acquisition time info in the scans.tsv or sessions.tsv to compute the age

date/time might need to be scrubbed/anonymized . I like @satra's

one can also include a variable in participants.tsv (age_at_first_session), and then have temporal offsets of session times in the sessions.tsv.

yarikoptic avatar Mar 02 '22 15:03 yarikoptic

Chiming in since i'm dealing with constantly changing age values working with multi-session infant data. Overall I think it would help with readability / potential errors to avoid a calculation of start value + offset to get session level metadata.

On a side note, what's the rationales for forbidding inheritance for tabular metadata but allowing it for JSON metadata? I can't help but think that inheritance would have solved this in an elegant way.

I would also like to know, as my first thought was the sessions.tsv could override conflicting values present in the participants.tsv.

Also, is sessions.json (a sidecar similar to participants.json) a BIDS valid file? I don't see any mention currently in the spec but there must be a place to define non-standard sessions.tsv columns.

mgxd avatar Mar 10 '22 16:03 mgxd

Also, is sessions.json (a sidecar similar to participants.json) a BIDS valid file? I don't see any mention currently in the spec but there must be a place to define non-standard sessions.tsv columns.

Yes, sub-<label>_sessions.json does exist and serves a similar puprose as for the participants sidecar applied to session-level tabular data.

ghisvail avatar Mar 10 '22 16:03 ghisvail

Chiming in since i'm dealing with constantly changing age values working with multi-session infant data. Overall I think it would help with readability / potential errors to avoid a calculation of start value + offset to get session level metadata.

Sounds brutal to deal with. I'd say in your case, you'd want to store age at the session level and have a separate age_at_recruitment at the participant level.

What solution have you adopted so far?

ghisvail avatar Mar 10 '22 16:03 ghisvail

The current solution is to just process 1 session at a time, specifying the age (in months) through a flag. This isn't very scalable though - I was thinking of something along these lines:

  1. Check sessions.tsv for age
  2. If not found, check participants.tsv
  3. If still not found, require flag

(link to initial issue https://github.com/nipreps/nibabies/issues/75#issuecomment-865199930)

mgxd avatar Mar 10 '22 16:03 mgxd

I came across the following paragraph in the spec:

Age SHOULD be given as the number of years since birth at the time of scanning (or first scan in case of multi session datasets). Using higher accuracy (weeks) should in general be avoided due to privacy protection, unless when appropriate given the study goals, for example, when scanning babies.

So it appears there is some official recommendations towards keeping age for the participant level even in a multi-session dataset. In this context, I guess a separate age_at_session would make sense.

ghisvail avatar Mar 17 '22 19:03 ghisvail