pybids icon indicating copy to clipboard operation
pybids copied to clipboard

`layout.get` not returning subject directory paths for datasets with no session layer

Open alyssadai opened this issue 2 years ago • 3 comments

Hi there,

I would like to use layout.get to get subject-level directory paths for BIDS datasets, but am getting unexpected results for datasets that do not have a session layer (but still have imaging data), e.g. bids-examples dataset "ds003":

ds003/
├── sub-01/
│   ├── anat/
│   └── func/
└── ...

Commands used to load the dataset and try to fetch the directory path for a specific subject:

>>> layout = BIDSLayout("bids-examples/ds003", validate=True)
>>> layout
BIDS Layout: ...\bagelbids\bids-examples\ds003 | Subjects: 13 | Sessions: 0 | Runs: 0
>>> layout.get(return_type="id", target="subject")  # double check that subject IDs are able to be parsed
['12', '08', '10', '13', '04', '05', '07', '11', '03', '02', '06', '09', '01']
>>> layout.get(subject="01", target="subject", return_type="dir")  # ISSUE: returns an empty list
[]

In the last line, using target="subject", return_type="dir" returns an empty list as opposed to a list of paths, even though pybids appears to be recognizing that there is subject-level data. This issue persists even when subject isn't specified in layout.get.

Strangely, not all the bids-examples datasets which are missing a session layer cause this behaviour. For example I've noticed it for ds003 and eeg_ds000117, but eeg_cbm returns the subject paths as expected:

>>> layout = BIDSLayout("bids-examples/eeg_cbm", validate=True)
>>> layout.get(subject="cbm001", target="subject", return_type="dir")
['D:\\neurobagel\\bagelbids\\bids-examples\\eeg_cbm\\sub-cbm001']
>>> layout.get(target="subject", return_type="dir")
['D:\\neurobagel\\bagelbids\\bids-examples\\eeg_cbm\\sub-cbm001', 'D:\\neurobagel\\bagelbids\\bids-examples\\eeg_cbm\\sub-cbm002', 'D:\\neurobagel\\bagelbids\\bids-examples\\eeg_cbm\\sub-cbm003', 'D:\\neurobagel\\bagelbids\\bids-examples\\eeg_cbm\\sub-cbm004', 'D:\\neurobagel\\bagelbids\\bids-examples\\eeg_cbm\\sub-cbm005', 'D:\\neurobagel\\bagelbids\\bids-examples\\eeg_cbm\\sub-cbm006', 'D:\\neurobagel\\bagelbids\\bids-examples\\eeg_cbm\\sub-cbm007', 'D:\\neurobagel\\bagelbids\\bids-examples\\eeg_cbm\\sub-cbm008', 'D:\\neurobagel\\bagelbids\\bids-examples\\eeg_cbm\\sub-cbm009', 'D:\\neurobagel\\bagelbids\\bids-examples\\eeg_cbm\\sub-cbm010', 'D:\\neurobagel\\bagelbids\\bids-examples\\eeg_cbm\\sub-cbm011', 'D:\\neurobagel\\bagelbids\\bids-examples\\eeg_cbm\\sub-cbm012', 'D:\\neurobagel\\bagelbids\\bids-examples\\eeg_cbm\\sub-cbm013', 'D:\\neurobagel\\bagelbids\\bids-examples\\eeg_cbm\\sub-cbm014', 'D:\\neurobagel\\bagelbids\\bids-examples\\eeg_cbm\\sub-cbm015', 'D:\\neurobagel\\bagelbids\\bids-examples\\eeg_cbm\\sub-cbm016', 'D:\\neurobagel\\bagelbids\\bids-examples\\eeg_cbm\\sub-cbm017', 'D:\\neurobagel\\bagelbids\\bids-examples\\eeg_cbm\\sub-cbm018', 'D:\\neurobagel\\bagelbids\\bids-examples\\eeg_cbm\\sub-cbm019', 'D:\\neurobagel\\bagelbids\\bids-examples\\eeg_cbm\\sub-cbm020']

Not sure if this is a bug or if the inconsistent behavior is due to specific differences in the dataset structure. Any help on this would be much appreciated!

alyssadai avatar Apr 10 '23 04:04 alyssadai

To be honest, I'm surprised anybody uses return_type='dir'. Looking at the code, it's extracted in a pretty baroque way because we index files, not directories.

I'm not sure that we want to support this long-term, so it might be best to take another approach. What about:

subject_dirs = [Path(layout.root) / f'sub-{subject}' for subject in layout.get_subjects()]

effigies avatar Apr 11 '23 00:04 effigies

Hi @effigies, thanks again for your advice on this. We ended up going with your suggested method to extract session / subject directories.

Just letting you know that we also noticed during experimenting that layout.get(..., return_type="dir") has issues fetching the path of a given session when a subject has exactly one session:

import bids
layout = bids.BIDSLayout("bids-examples/ieeg_motorMiller2007")
layout.get(subject="cc", session="01", target="session", return_type="dir")
Out[]: []

# BUT:
layout.get_sessions(subject="cc")
Out[]: ['01']

This does seem to reinforce your statement that return_type='dir' isn't the most reliable for these use cases.

In light of this, would it make sense to update the docs (https://bids-standard.github.io/pybids/examples/pybids_tutorial.html#other-return-type-values) to either remove reference to this parameter, or warn about its usage?

Can also open another issue for the docs update if that would be helpful.

alyssadai avatar Jun 19 '23 16:06 alyssadai

Yes, I think it would be a good idea to discourage use of this option.

effigies avatar Jun 19 '23 21:06 effigies