ConfigError: Multiple path templates for 't1w' component using generate_inputs
Description:
Hi,
I'm trying to use generate_inputs() to load my BIDS data with the following code:
inputs = generate_inputs(
bids_dir=bids_dir,
pybids_inputs=config["pybids_inputs"],
validate=False,
)
Here is the relevant portion of my config:
pybids_inputs:
t1w:
filters:
suffix: "T1w"
extension: ".nii.gz"
datatype: "anat"
invalid_filters: "allow"
wildcards:
- subject
- session
- acquisition
- run
- desc
- part
- mt
However, I encountered the following error when running snakemake -np:
ConfigError in file /home/bic/eyang/Documents/workflows/pipelines/ironfist/Snakefile, line 44:
Multiple path templates for one component. Use --filter_t1w to narrow your search or --wildcards_t1w to make the template more generic.
component = 't1w'
path_templates = [
'/data_/mica3/BIDS_brainscores/rawdata/sub-{subject}/ses-{session}/anat/sub-{subject}_ses-{session}_acq-{acq}_run-{run}_T1w.nii.gz',
'/data_/mica3/BIDS_brainscores/rawdata/sub-{subject}/ses-{session}/anat/sub-{subject}_ses-{session}_acq-{acq}_T1w.nii.gz',
'/data_/mica3/BIDS_brainscores/rawdata/sub-{subject}/anat/sub-{subject}_T1w.nii.gz',
'/data_/mica3/BIDS_brainscores/rawdata/sub-{subject}/ses-{session}/anat/sub-{subject}_ses-{session}_acq-{acq}_part-{part}_T1w.nii.gz'
]
Environment:
$ conda list snakebids
# Name Version
snakebids 0.14.0
Directory structure:
├── bids_validator_output.txt
├── CITATION.cff
├── dataset_description.json
├── participants_7t2bids.tsv
├── participants.json
├── participants.tsv
├── README
├── sub-10058
│ └── anat
│ ├── sub-10058_desc-iron.nii.gz
│ ├── sub-10058_desc-MTR.nii.gz
│ └── sub-10058_T1w.nii.gz
├── sub-mpn20250310
│ ├── ses-v1
│ │ ├── anat
│ │ │ ├── sub-mpn20250310_ses-v1_acq-mtw_mt-off_MTR.json
│ │ │ ├── sub-mpn20250310_ses-v1_acq-mtw_mt-off_MTR.nii.gz
│ │ │ ├── sub-mpn20250310_ses-v1_acq-mtw_mt-off_part-phase_MTR.json
│ │ │ ├── sub-mpn20250310_ses-v1_acq-mtw_mt-off_part-phase_MTR.nii.gz
│ │ │ ├── sub-mpn20250310_ses-v1_acq-mtw_mt-on_MTR.json
│ │ │ ├── sub-mpn20250310_ses-v1_acq-mtw_mt-on_MTR.nii.gz
│ │ │ ├── sub-mpn20250310_ses-v1_acq-mtw_mt-on_part-phase_MTR.json
│ │ │ ├── sub-mpn20250310_ses-v1_acq-mtw_mt-on_part-phase_MTR.nii.gz
│ │ │ ├── sub-mpn20250310_ses-v1_acq-mtw_part-phase_T1w.json
│ │ │ ├── sub-mpn20250310_ses-v1_acq-mtw_part-phase_T1w.nii.gz
│ │ │ ├── sub-mpn20250310_ses-v1_acq-mtw_T1w.json
│ │ │ ├── sub-mpn20250310_ses-v1_acq-mtw_T1w.nii.gz
│ │ │ ├── sub-mpn20250310_ses-v1_acq-neuromelaninMTw_run-1_T1w.json
│ │ │ ├── sub-mpn20250310_ses-v1_acq-neuromelaninMTw_run-1_T1w.nii.gz
│ │ │ ├── sub-mpn20250310_ses-v1_acq-neuromelaninMTw_run-2_T1w.json
│ │ │ ├── sub-mpn20250310_ses-v1_acq-neuromelaninMTw_run-2_T1w.nii.gz
│ │ │ ├── sub-mpn20250310_ses-v1_acq-neuromelaninMTw_run-3_T1w.json
│ │ │ ├── sub-mpn20250310_ses-v1_acq-neuromelaninMTw_run-3_T1w.nii.gz
│ │ │ ├── sub-mpn20250310_ses-v1_acq-neuromelaninMTw_run-4_T1w.json
│ │ │ └── sub-mpn20250310_ses-v1_acq-neuromelaninMTw_run-4_T1w.nii.gz
│ │ ├── dwi
│ │ ├── fmap
│ │ │ ├── sub-mpn20250310_ses-v1_acq-anat_TB1TFL.json
│ │ │ ├── sub-mpn20250310_ses-v1_acq-anat_TB1TFL.nii.gz
│ │ │ ├── sub-mpn20250310_ses-v1_acq-sfam_TB1TFL.json
│ │ │ └── sub-mpn20250310_ses-v1_acq-sfam_TB1TFL.nii.gz
│ │ └── func
│ └── sub-mpn20250310_sessions.tsv
└── sub-mx250204
└── anat
├── sub-mx250204_FLAIR.nii.gz
└── sub-mx250204_T1w.nii.gz
Question:
Given the file heterogeneity, what is the correct way to specify filters or wildcards to avoid this Multiple path templates error for t1w? Is there a best practice to handle mixed file patterns like this?
Thanks in advance!
Unfortunately, this specific type of heterogenity in a single bids component (entities missing in some files, e.g. some with session, some without session) is not currently supported with snakebids. There have been recent discussions about addressing this limitation in an API overhaul, but this hasn't been tackled yet..
One way you could try to deal with this heterogeneity in the workflow currently is to define multiple components, and then parse all of them. Then you'd also have to deal with them separately in downstream rules, which would be different depending on whether you're wanting to retain the wildcards throughout the workflow into the final targets, or if you're going to discard them (e.g. by picking 1 T1w image per subject for example..).
Right now it seems you have 4 variants so could make 4 bids components:
t1w t1w_ses_acq t1w_ses_acq_run t1w_ses_acq_part
e.g.
pybids_inputs:
t1w:
filters:
suffix: "T1w"
extension: ".nii.gz"
datatype: "anat"
acquisition=false
session=false
run=false
part=false
invalid_filters: "allow"
wildcards:
- subject
t1w_ses_acq:
filters:
suffix: "T1w"
extension: ".nii.gz"
datatype: "anat"
acquisition=true
session=true
run=false
part=false
invalid_filters: "allow"
wildcards:
- subject
- session
- acquisition
and so on..
This could get tedious if you have alot of variations though.. Thoughts?
btw here is the reference for what kind of filters you can put on the inputs: https://snakebids.readthedocs.io/en/stable/api/internals.html#snakebids.types.InputConfig
We can live with the session issue, but I think we need a bit more help parsing inputs. We have the following in the snakebids.yml:
pybids_inputs:
t1w:
filters:
suffix: T1w
extension: nii.gz
wildcards:
- subject
- session
- run
which, in one complex example, returns the following:
Multiple path templates for one component. Use --filter_t1w to narrow your search or --wildcards_t1w to make the template more generic.
component = 't1w'
path_templates = [
'/data/mica3/BIDS_MPN/rawdata/sub-{subject}/ses-{session}/anat/sub-{subject}_ses-{session}_acq-neuromelaninMTw_T1w.nii.gz',
'/data/mica3/BIDS_MPN/rawdata/sub-{subject}/ses-{session}/anat/sub-{subject}_ses-{session}_acq-neuromelaninMTw_run-{run}_T1w.nii.gz',
'/data/mica3/BIDS_MPN/rawdata/sub-{subject}/ses-{session}/anat/sub-{subject}_ses-{session}_T1w.nii.gz',
'/data/mica3/BIDS_MPN/rawdata/sub-{subject}/ses-{session}/anat/sub-{subject}_ses-{session}_acq-mtw_T1w.nii.gz',
'/data/mica3/BIDS_MPN/rawdata/sub-{subject}/ses-{session}/anat/sub-{subject}_ses-{session}_acq-mtw_part-phase_T1w.nii.gz'
]
Let's say we want to pull the middle file (no acquisition). Basically we want to disallow some filters. I've tried setting a few things like --filter_t1w acquisition=None or allow_invalid=False, but can't seem to grab that file still. Any advice for this?
To clarify a bit further, this works:
pybids_inputs:
t1w:
filters:
suffix: T1w
extension: nii.gz
acquisition: false
wildcards:
- subject
- session
- run
But I cannot seem to specify this via the CLI
Ah yes there is special CLI syntax for this (since acquisition=value is always interpreted as an exact match).
Excerpt from here: https://snakebids.readthedocs.io/en/stable/api/plugins.html#snakebids.plugins.ComponentEdit
Filters are specified on the CLI using ENTITY[:METHOD][=VALUE], as follows:
ENTITY=VALUE selects paths based on an exact value match.
ENTITY:match=REGEX and ENTITY:search=REGEX selects paths using regex with [re.match()](https://docs.python.org/3/library/re.html#re.match) and [re.search()](https://docs.python.org/3/library/re.html#re.search) respectively. This syntax can be used to select multiple values (e.g. 'session:match=01|02').
ENTITY:required selects all paths with the entity, regardless of value.
ENTITY:none selects all paths without the entity.
ENTITY:any removes filters for the entity.
E.g., if you want to set acquisition=false (eg to drop any paths with acquisition), then you can use:
--filter-t1w acquisition:none
perfect! thanks
Just thinking more about the issue above with session: we sometimes have datasets where ses, run, or other things exist for one subject but not for others. These still technically pass the BIDS spec, even though they're kindof against the spirit of BIDS. In these cases, we can rename the input files to be more consistent, but for other people's data its better to be as hands-off as possible. What about an optional plugin that creates a symlink filename in the input directory for any wildcards that are present in at least one case but missing for others? The scope of this plugin could be within micapipe (which is what we're working on now), or it could be a part of snakebids. Thoughts?
Perhaps could work, though you might run into downstream issues if the bids dataset internally refers to files e.g. inside json sidecards (like with IntendedFor flags for field maps).. Also making symlinks in the input folder could make bids parsing fail for other apps, so might want to keep the modified version separate..
Note it's not really against the spirit of BIDS to not use all the same entities for all the files, it just reflects whatever heterogeneity is in the dataset. This limitation we have is inherent to snakemake/snakebids (and not pybids), since we use a single path format string, which is then formatted with the entity values when it is used in snakemake..
I thought that snakebids uses pybids to parse inputs? (though its been a long time since I've really looked into it). Would it be possible within snakebids to use multiple path format strings? I suppose that's what you suggested by having t1w t1w_ses_acq t1w_ses_acq_run t1w_ses_acq_part etc. Perhaps a plugin could procedurally generate all combinations of these allowable wildcards, and then when we run our first import rules we iterate over the relevant pybids_inputs.
We'd really like to get something that will work for micapipe, ideally in the short-term to include as the CLI for its v1.0 release. I can also imagine a plugin that creates temporary symlinks and removes them after (though that does still include some risk for other apps running on the BIDS dataset simultaneously).
Based on your experience, which of these two plugin options (or something else) might make a good way forward for now? I'm leaning towards the former.
Also note that each micapipe module is still a bash script that parses its own input/output filenames. Eventually we'll make it fully snakebids, but we need something that will work in the relatively short-term and doing the initial parsing will set the CLI.
Yes snakebids uses pybids under the hood, but after parsing with pybids it makes it's snakemake-friendly structure (with path and wildcards) -- it is the latter that has these limitations.. No it is not currently possible for one snakebids component to have multiple different paths. But you can make as many snakebids components as you want - though you might need to catch some exceptions if some don't capture any files and error-out..
Yes just updating your import rules to account for the various input wildcards encountered (eg either making one for each, or using an input function to pick out the files and the appropriate component) is one way to deal with this, but then you do have to conform all your outputs to have the same kind of entities. You could also perhaps define a wrapper function and use that instead of the snakebids components directly, and that wrapper would pick the component to use based on the wildcards present..
Sorry I can't help much directly on this at the moment, I just have too many other things on my plate at this time..
I was also thinking that the combinatorial logic would take place in a wrapper / plugin, and would only include entities that actually appear at least once. I could actually imagine this combinatorial process as a snakebids feature, where any missing entities (i.e. those that are present at least once but missing in other cases) are infilled with the entity-null or perhaps even more safely entity-snakenull
I know you don't have a lot of time to commit, but maybe just some advice: should I open a snakebids branch to create such a feature? If it works well then we could consider merging, but if not it could be just for micapipe
I don't see right now how this would be a general snakebids plugin, but it doesn't actually have to be, e.g. you can define and load any snakebids plugin from a py file (e.g. see our atlas.py plugin in hippunfold, which is a snakebids plugin but not part of snakebids).
Though I would first see if the logic can simply be built into your workflow first, as the plugins are really designed for adapting the CLI, not the workflow itself..
Here is a potential fix for this issue: https://github.com/khanlab/snakebids/pull/470
Note I feel this is general to snakebids since it comes up for micapipe, but it could just as easily come up for other bidsapps build with snakebids (such as hippunfold); it just hasn't yet