pybids
pybids copied to clipboard
Variable meta-data is restricted for normal variables, but not "regressors"
I pass an ignore directing to BIDSLayout in order to not index fmriprep confounds:
layout = BIDSLayout('/work/HealthyBrainNetwork/preproc/fmriprep',
derivatives=['/work/HealthyBrainNetwork/neuroscout-bundles/MzXpw', '/work/HealthyBrainNetwork/preproc/fmriprep'],
ignore=[re.compile('.*desc-confounds_regressors.tsv')])
However, when I make my analysis, and ask for the design matrix meta-data:
analysis = Analysis(layout, model)
analysis.setup()
analysis.steps[0].get_design_matrix()[0][2]
I only get the following back:
{'datatype': 'func',
'subject': 'NDARAC853DTE',
'suffix': 'bold',
'task': 'movieDM'}
My guess is that somehow including files that are are real and should be indexed normally somehow deletes this meta-data.
If I change the ignore directive to be something that doesn't match anything:
layout = BIDSLayout('/work/HealthyBrainNetwork/preproc/fmriprep',
derivatives=['/work/HealthyBrainNetwork/neuroscout-bundles/MzXpw', '/work/HealthyBrainNetwork/preproc/fmriprep'],
ignore=[re.compile('.*some-nonexistant-file.tsv')])
then analysis.steps[0].get_design_matrix()[0][2] actually returns the full meta-data:
{'RepetitionTime': 0.8,
'datatype': 'func',
'desc': 'preproc',
'extension': 'nii.gz',
'space': 'MNI152NLin2009cAsym',
'subject': 'NDARAC853DTE',
'suffix': 'bold',
'task': 'movieDM'}
I've tested this on master and 0.9.1. This was not an issue in pybids 0.8, so it must have been something introduced on the 0.9 change.
Progress update, it looks like this has less to do with ignore, and more with the way my entities are being added to the variables I include in the other derivatives.
When the confounds are (successfully) ignored, their entities (which include RT) are no longer attached to the collection, which includes variables I included in another derivative folder.
Two questions come to mind:
-
Shouldn't meta-data that is defined in
bids_dirhierarchically apply to files that match those entities in derivatives? -
If a collection is made with variables that differ on entities, it looks like the union of entities is being taken, rather than the intersection. I think I'd expect the latter?
If this behavior (which did change from 0.8), doesn't seem like I bug, I'm happy to close. I can easily fix this by including a meta-data file (with RT) in the neuroscout bundle, for entities such as RepetitionTime.
-
Depends on the specifics... not all potential inheritance patterns are valid. Can you give filename examples?
-
I think it should be the union. We don't know what the user might want to do downstream, and I don't see a good reason to stipulate that you can only work with entities that are common to all variables in the collection.
Actually, it looks like I'm already including that meta-data in the bundle, so there may still be a bug here.... Let me look.
Conclusion is that task-movieDM_bold.json would not match sub-NDARAC904DMU_task-movieDM_events.tsv.
Instead, we need: task-movieDM_events.tsv for proper inheritance.
And I need to change the code in the celery worker to make bundles that will work with fitlins 0.5.0.
Spoke too soon. Looks like even with renaming that .json file I still have problems. Looks like its because even though the _events file now gets the appropriate entities, the "variable" itself does not.
And it has the following entities:
{'suffix': 'bold',
'subject': 'NDARZY668NMV',
'datatype': 'func',
'task': 'movieDM'}
Looks like somewhere along the pipeline of load_events the RepetitionTime entity get lost.
Can you check if the same happens in 0.9.1? Could be a consequence of #488.
Now that I'm looking at the code, it looks like RepetitionTime could never be an entity that is associaed with a variable, as it limits it to ALL_VARIABLES which is defined as:
BASE_ENTITIES = ['subject', 'session', 'task', 'run']
ALL_ENTITIES = BASE_ENTITIES + ['datatype', 'suffix', 'acquisition']
My guess is that since this does not happen for regressor io, and this code has only been tested on datasets with fmriprep regressors (and as I demonstrated earlier, due to the union of entities, you'd see RepetitionTime for the entire collection if you include that).
yep, pretty sure that's it. When I comment out the line that limits entities to ALL_ENTITIES , I get RepetitionTime as a meta-data entity:
In [16]: layout.get_collections(level='run', subject='NDARZY668NMV')[0].entities
Out[16]:
{'RepetitionTime': 0.8,
'datatype': 'func',
'desc': 'preproc',
'extension': 'nii.gz',
'space': 'MNI152NLin2009cAsym',
'subject': 'NDARZY668NMV',
'suffix': 'bold',
'task': 'movieDM'}
so I guess we need to decide which entities can be associated with variables.
to me it seems pretty messy to allow any meta-data entity to be associated with a variable, and the way we were doing thing before (which was to get the RepetitionTime via the _bold file , rather than directly through the _event) seems less messy.
The difficulty with disallowing metadata as entities is it means users may have to do a lot of work downstream to recover this information. Variables often get merged at some point (e.g., multiple runs can be stored in a single SparseRunVariable), so the entity info is stored in DF columns.
I'm a bit hesitant to fix this right now; there will be some fairly near-term changes to io.py (related to #500) that will probably abstract a lot of the data-reading code into the BIDSFile hierarchy, and simplify things. So maybe figure out a way around this in the short term, and we can revisit later.
Is there a reason that entities for events are only allowed for ALL_ENTITIES then?
This can definitely be worked around in fitlins. And as far as I'm concerned I'm fine with reverting to fitlins 0.4.0, but I'll re-open this issue and re-name it so we don't forget about this.