snakebids icon indicating copy to clipboard operation
snakebids copied to clipboard

ConfigError: Multiple path templates for 't1w' component using generate_inputs

Open zihuaihuai opened this issue 5 months ago • 13 comments

Description:

Hi,

I'm trying to use generate_inputs() to load my BIDS data with the following code:

inputs = generate_inputs(
    bids_dir=bids_dir,
    pybids_inputs=config["pybids_inputs"],
    validate=False,
)

Here is the relevant portion of my config:

pybids_inputs:
  t1w:
    filters:
      suffix: "T1w"
      extension: ".nii.gz"
      datatype: "anat"
      invalid_filters: "allow"
    wildcards:
      - subject
      - session
      - acquisition
      - run
      - desc
      - part
      - mt

However, I encountered the following error when running snakemake -np:

ConfigError in file /home/bic/eyang/Documents/workflows/pipelines/ironfist/Snakefile, line 44:
Multiple path templates for one component. Use --filter_t1w to narrow your search or --wildcards_t1w to make the template more generic.
    component = 't1w'
    path_templates = [
        '/data_/mica3/BIDS_brainscores/rawdata/sub-{subject}/ses-{session}/anat/sub-{subject}_ses-{session}_acq-{acq}_run-{run}_T1w.nii.gz',
        '/data_/mica3/BIDS_brainscores/rawdata/sub-{subject}/ses-{session}/anat/sub-{subject}_ses-{session}_acq-{acq}_T1w.nii.gz',
        '/data_/mica3/BIDS_brainscores/rawdata/sub-{subject}/anat/sub-{subject}_T1w.nii.gz',
        '/data_/mica3/BIDS_brainscores/rawdata/sub-{subject}/ses-{session}/anat/sub-{subject}_ses-{session}_acq-{acq}_part-{part}_T1w.nii.gz'
    ]

Environment:

$ conda list snakebids
# Name                    Version
snakebids                 0.14.0

Directory structure:

├── bids_validator_output.txt
├── CITATION.cff
├── dataset_description.json
├── participants_7t2bids.tsv
├── participants.json
├── participants.tsv
├── README
├── sub-10058
│   └── anat
│       ├── sub-10058_desc-iron.nii.gz
│       ├── sub-10058_desc-MTR.nii.gz
│       └── sub-10058_T1w.nii.gz
├── sub-mpn20250310
│   ├── ses-v1
│   │   ├── anat
│   │   │   ├── sub-mpn20250310_ses-v1_acq-mtw_mt-off_MTR.json
│   │   │   ├── sub-mpn20250310_ses-v1_acq-mtw_mt-off_MTR.nii.gz
│   │   │   ├── sub-mpn20250310_ses-v1_acq-mtw_mt-off_part-phase_MTR.json
│   │   │   ├── sub-mpn20250310_ses-v1_acq-mtw_mt-off_part-phase_MTR.nii.gz
│   │   │   ├── sub-mpn20250310_ses-v1_acq-mtw_mt-on_MTR.json
│   │   │   ├── sub-mpn20250310_ses-v1_acq-mtw_mt-on_MTR.nii.gz
│   │   │   ├── sub-mpn20250310_ses-v1_acq-mtw_mt-on_part-phase_MTR.json
│   │   │   ├── sub-mpn20250310_ses-v1_acq-mtw_mt-on_part-phase_MTR.nii.gz
│   │   │   ├── sub-mpn20250310_ses-v1_acq-mtw_part-phase_T1w.json
│   │   │   ├── sub-mpn20250310_ses-v1_acq-mtw_part-phase_T1w.nii.gz
│   │   │   ├── sub-mpn20250310_ses-v1_acq-mtw_T1w.json
│   │   │   ├── sub-mpn20250310_ses-v1_acq-mtw_T1w.nii.gz
│   │   │   ├── sub-mpn20250310_ses-v1_acq-neuromelaninMTw_run-1_T1w.json
│   │   │   ├── sub-mpn20250310_ses-v1_acq-neuromelaninMTw_run-1_T1w.nii.gz
│   │   │   ├── sub-mpn20250310_ses-v1_acq-neuromelaninMTw_run-2_T1w.json
│   │   │   ├── sub-mpn20250310_ses-v1_acq-neuromelaninMTw_run-2_T1w.nii.gz
│   │   │   ├── sub-mpn20250310_ses-v1_acq-neuromelaninMTw_run-3_T1w.json
│   │   │   ├── sub-mpn20250310_ses-v1_acq-neuromelaninMTw_run-3_T1w.nii.gz
│   │   │   ├── sub-mpn20250310_ses-v1_acq-neuromelaninMTw_run-4_T1w.json
│   │   │   └── sub-mpn20250310_ses-v1_acq-neuromelaninMTw_run-4_T1w.nii.gz
│   │   ├── dwi
│   │   ├── fmap
│   │   │   ├── sub-mpn20250310_ses-v1_acq-anat_TB1TFL.json
│   │   │   ├── sub-mpn20250310_ses-v1_acq-anat_TB1TFL.nii.gz
│   │   │   ├── sub-mpn20250310_ses-v1_acq-sfam_TB1TFL.json
│   │   │   └── sub-mpn20250310_ses-v1_acq-sfam_TB1TFL.nii.gz
│   │   └── func
│   └── sub-mpn20250310_sessions.tsv
└── sub-mx250204
    └── anat
        ├── sub-mx250204_FLAIR.nii.gz
        └── sub-mx250204_T1w.nii.gz

Question:

Given the file heterogeneity, what is the correct way to specify filters or wildcards to avoid this Multiple path templates error for t1w? Is there a best practice to handle mixed file patterns like this?

Thanks in advance!

zihuaihuai avatar Jul 14 '25 18:07 zihuaihuai

Unfortunately, this specific type of heterogenity in a single bids component (entities missing in some files, e.g. some with session, some without session) is not currently supported with snakebids. There have been recent discussions about addressing this limitation in an API overhaul, but this hasn't been tackled yet..

One way you could try to deal with this heterogeneity in the workflow currently is to define multiple components, and then parse all of them. Then you'd also have to deal with them separately in downstream rules, which would be different depending on whether you're wanting to retain the wildcards throughout the workflow into the final targets, or if you're going to discard them (e.g. by picking 1 T1w image per subject for example..).

Right now it seems you have 4 variants so could make 4 bids components:

t1w t1w_ses_acq t1w_ses_acq_run t1w_ses_acq_part

e.g.

pybids_inputs:
  t1w:
    filters:
      suffix: "T1w"
      extension: ".nii.gz"
      datatype: "anat"
      acquisition=false
      session=false
      run=false
      part=false
      invalid_filters: "allow"
    wildcards:
      - subject


  t1w_ses_acq:
    filters:
      suffix: "T1w"
      extension: ".nii.gz"
      datatype: "anat"
      acquisition=true
      session=true
      run=false
      part=false
      invalid_filters: "allow"
    wildcards:
      - subject
      - session
      - acquisition

and so on..

This could get tedious if you have alot of variations though.. Thoughts?

akhanf avatar Jul 23 '25 13:07 akhanf

btw here is the reference for what kind of filters you can put on the inputs: https://snakebids.readthedocs.io/en/stable/api/internals.html#snakebids.types.InputConfig

akhanf avatar Jul 23 '25 13:07 akhanf

We can live with the session issue, but I think we need a bit more help parsing inputs. We have the following in the snakebids.yml:

pybids_inputs:
  t1w:
    filters:
      suffix: T1w
      extension: nii.gz

    wildcards:
      - subject
      - session
      - run

which, in one complex example, returns the following:

Multiple path templates for one component. Use --filter_t1w to narrow your search or --wildcards_t1w to make the template more generic.
    component = 't1w'
    path_templates = [
        '/data/mica3/BIDS_MPN/rawdata/sub-{subject}/ses-{session}/anat/sub-{subject}_ses-{session}_acq-neuromelaninMTw_T1w.nii.gz',
        '/data/mica3/BIDS_MPN/rawdata/sub-{subject}/ses-{session}/anat/sub-{subject}_ses-{session}_acq-neuromelaninMTw_run-{run}_T1w.nii.gz',
        '/data/mica3/BIDS_MPN/rawdata/sub-{subject}/ses-{session}/anat/sub-{subject}_ses-{session}_T1w.nii.gz',
        '/data/mica3/BIDS_MPN/rawdata/sub-{subject}/ses-{session}/anat/sub-{subject}_ses-{session}_acq-mtw_T1w.nii.gz',
        '/data/mica3/BIDS_MPN/rawdata/sub-{subject}/ses-{session}/anat/sub-{subject}_ses-{session}_acq-mtw_part-phase_T1w.nii.gz'
    ]

Let's say we want to pull the middle file (no acquisition). Basically we want to disallow some filters. I've tried setting a few things like --filter_t1w acquisition=None or allow_invalid=False, but can't seem to grab that file still. Any advice for this?

jordandekraker avatar Aug 14 '25 14:08 jordandekraker

To clarify a bit further, this works:

pybids_inputs:
  t1w:
    filters:
      suffix: T1w
      extension: nii.gz
      acquisition: false

    wildcards:
      - subject
      - session
      - run

But I cannot seem to specify this via the CLI

jordandekraker avatar Aug 14 '25 14:08 jordandekraker

Ah yes there is special CLI syntax for this (since acquisition=value is always interpreted as an exact match).

Excerpt from here: https://snakebids.readthedocs.io/en/stable/api/plugins.html#snakebids.plugins.ComponentEdit

Filters are specified on the CLI using ENTITY[:METHOD][=VALUE], as follows:

    ENTITY=VALUE selects paths based on an exact value match.

    ENTITY:match=REGEX and ENTITY:search=REGEX selects paths using regex with [re.match()](https://docs.python.org/3/library/re.html#re.match) and [re.search()](https://docs.python.org/3/library/re.html#re.search) respectively. This syntax can be used to select multiple values (e.g. 'session:match=01|02').

    ENTITY:required selects all paths with the entity, regardless of value.

    ENTITY:none selects all paths without the entity.

    ENTITY:any removes filters for the entity.

E.g., if you want to set acquisition=false (eg to drop any paths with acquisition), then you can use:

--filter-t1w acquisition:none

akhanf avatar Aug 14 '25 16:08 akhanf

perfect! thanks

jordandekraker avatar Aug 14 '25 16:08 jordandekraker

Just thinking more about the issue above with session: we sometimes have datasets where ses, run, or other things exist for one subject but not for others. These still technically pass the BIDS spec, even though they're kindof against the spirit of BIDS. In these cases, we can rename the input files to be more consistent, but for other people's data its better to be as hands-off as possible. What about an optional plugin that creates a symlink filename in the input directory for any wildcards that are present in at least one case but missing for others? The scope of this plugin could be within micapipe (which is what we're working on now), or it could be a part of snakebids. Thoughts?

jordandekraker avatar Aug 14 '25 18:08 jordandekraker

Perhaps could work, though you might run into downstream issues if the bids dataset internally refers to files e.g. inside json sidecards (like with IntendedFor flags for field maps).. Also making symlinks in the input folder could make bids parsing fail for other apps, so might want to keep the modified version separate..

Note it's not really against the spirit of BIDS to not use all the same entities for all the files, it just reflects whatever heterogeneity is in the dataset. This limitation we have is inherent to snakemake/snakebids (and not pybids), since we use a single path format string, which is then formatted with the entity values when it is used in snakemake..

akhanf avatar Aug 15 '25 03:08 akhanf

I thought that snakebids uses pybids to parse inputs? (though its been a long time since I've really looked into it). Would it be possible within snakebids to use multiple path format strings? I suppose that's what you suggested by having t1w t1w_ses_acq t1w_ses_acq_run t1w_ses_acq_part etc. Perhaps a plugin could procedurally generate all combinations of these allowable wildcards, and then when we run our first import rules we iterate over the relevant pybids_inputs.

We'd really like to get something that will work for micapipe, ideally in the short-term to include as the CLI for its v1.0 release. I can also imagine a plugin that creates temporary symlinks and removes them after (though that does still include some risk for other apps running on the BIDS dataset simultaneously).

Based on your experience, which of these two plugin options (or something else) might make a good way forward for now? I'm leaning towards the former.

Also note that each micapipe module is still a bash script that parses its own input/output filenames. Eventually we'll make it fully snakebids, but we need something that will work in the relatively short-term and doing the initial parsing will set the CLI.

jordandekraker avatar Aug 15 '25 12:08 jordandekraker

Yes snakebids uses pybids under the hood, but after parsing with pybids it makes it's snakemake-friendly structure (with path and wildcards) -- it is the latter that has these limitations.. No it is not currently possible for one snakebids component to have multiple different paths. But you can make as many snakebids components as you want - though you might need to catch some exceptions if some don't capture any files and error-out..

Yes just updating your import rules to account for the various input wildcards encountered (eg either making one for each, or using an input function to pick out the files and the appropriate component) is one way to deal with this, but then you do have to conform all your outputs to have the same kind of entities. You could also perhaps define a wrapper function and use that instead of the snakebids components directly, and that wrapper would pick the component to use based on the wildcards present..

Sorry I can't help much directly on this at the moment, I just have too many other things on my plate at this time..

akhanf avatar Aug 15 '25 13:08 akhanf

I was also thinking that the combinatorial logic would take place in a wrapper / plugin, and would only include entities that actually appear at least once. I could actually imagine this combinatorial process as a snakebids feature, where any missing entities (i.e. those that are present at least once but missing in other cases) are infilled with the entity-null or perhaps even more safely entity-snakenull

I know you don't have a lot of time to commit, but maybe just some advice: should I open a snakebids branch to create such a feature? If it works well then we could consider merging, but if not it could be just for micapipe

jordandekraker avatar Aug 15 '25 13:08 jordandekraker

I don't see right now how this would be a general snakebids plugin, but it doesn't actually have to be, e.g. you can define and load any snakebids plugin from a py file (e.g. see our atlas.py plugin in hippunfold, which is a snakebids plugin but not part of snakebids).

Though I would first see if the logic can simply be built into your workflow first, as the plugins are really designed for adapting the CLI, not the workflow itself..

akhanf avatar Aug 15 '25 14:08 akhanf

Here is a potential fix for this issue: https://github.com/khanlab/snakebids/pull/470

Note I feel this is general to snakebids since it comes up for micapipe, but it could just as easily come up for other bidsapps build with snakebids (such as hippunfold); it just hasn't yet

jordandekraker avatar Aug 18 '25 19:08 jordandekraker