pybids
pybids copied to clipboard
derivatives=True breaks BIDSLayout
from pathlib import Path
from bids import BIDSLayout
import bids
for i in range(10):
layout = BIDSLayout(Path(bids.__file__).parent / "tests" / "data" / "7t_trt", database_path=(Path.home()),
derivatives=True, reset_database=True)
print(layout.entities["subject"].count())
gives something like
0
0
10
10
0
10
10
0
10
10
Removing the derivatives=True
from the call to BIDSLayout, we obtain
10
10
10
10
10
10
10
10
10
10
I can't reproduce your problem using your example:
...: from pathlib import Path
...: from bids import BIDSLayout
...: import bids
...:
...: for i in range(10):
...: layout = BIDSLayout(Path(bids.__file__).parent / "tests" / "data" / "7t_trt", database_path=(Path.home()),
...: derivatives=True, reset_database=True)
...: print(layout.entities["subject"].count())
...:
10
10
10
10
10
10
10
10
10
10
Tried it on ds005 and I'm seeing something similar. I'll investigate.
I tried this with the database off, and this error persists, so at least we can rule out the loading/saving as being the problem.
...: for i in range(10):
...: layout = BIDSLayout(Path(bids.__file__).parent / "tests" / "data" / "ds005", derivatives=True)
...: print(layout.entities["subject"].count())
...:
1
16
1
1
1
1
1
1
1
16
The correct value here should be 16.
When it returns one, the following still works:
In [106]: layout.get_subjects()
Out[106]:
['01',
'02',
'03',
'04',
'05',
'06',
'07',
'08',
'09',
'10',
'11',
'12',
'13',
'14',
'15',
'16']
even though:
In [107]: print(layout.entities["subject"].unique())
['01']
Ah, heh, I see what's going on. When you request layout.entities['subject']
with multiple layouts in scope (i.e., base and derivatives), the results are accumulated in a single dict. So, depending on the order layouts are returned (which will be random), you're actually calling count()
on a different Entity
object. The discrepancy in numbers above is because the derivative dataset has fewer subjects than the base dataset.
This will likely be a bit of a pain to solve, because the natural semantics here (count unique subjects over all layouts) don't mesh well with the internal implementation.
EDIT: I suppose a cheap "solution" would be have layout.entities["subject"]
return a list of Entity
objects, and then when you call .count()
it would fail, and it's your responsibility to either iterate, or call layout.get_entities(scope='bids')
to make sure you only get the base dataset's Entity
.
Alternatively, we could make it so that when you access entities through the .entities
property, rather than via get_entities()
, you always get back only the base layout, and never any derivatives. That also seems pretty reasonable. Thoughts?
I think just going off the base layout is a reasonable compromise.
Isn't this what we decided on for the __repr__
also?