bids-specification
bids-specification copied to clipboard
BEP044: Within-stimuli conditions
stim_file
columns in event
files allow users to specify which stimuli files are associated with an event onset:
stim_file | OPTIONAL. Represents the location of the stimulus file (image, video, sound etc.) presented at the given onset time. ...
However, what this does not allow for is the specification of sub-conditions that occur during a long-running stimulus.
For example, in ds001545 a video file is presented which spans the entirety of the run. However, within each run/video there are 6 distinct conditions.
For example:
onset | duration | trial_type | stim_file |
---|---|---|---|
6 | 90 | Intact A | cond1_run-01.mp4 |
105 | 90 | Scramble Fix C | cond1_run-01.mp4 |
204 | 90 | Scramble Rnd B V1 | cond1_run-01.mp4 |
303 | 90 | Scramble Fix C | cond1_run-01.mp4 |
402 | 90 | Intact A | cond1_run-01.mp4 |
501 | 90 | Scramble Rnd B V2 | cond1_run-01.mp4 |
IMO, the above example is invalid as the stim_file
only has a single onset.
The following is an event file which has all the necessary information (note I'm having to guess when the onset of the stim_file
is, it could actually be 0
).
onset | duration | trial_type | stim_file |
---|---|---|---|
6 | 540 | n/a | cond1_run-01.mp4 |
6 | 90 | Intact A | n/a |
105 | 90 | Scramble Fix C | n/a |
204 | 90 | Scramble Rnd B V1 | n/a |
303 | 90 | Scramble Fix C | n/a |
402 | 90 | Intact A | n/a |
501 | 90 | Scramble Rnd B V2 | n/a |
However, this is ambiguous as the conditions are only implied to occur during stimulus presentation due to the duration of the first row.
@tyarkoni suggests adding optional but strongly encouraged stim_onset
and stim_offset
columns. These would denote onsets within a stimulus.
I would have made it
onset | duration | trial_type | stim_file |
---|---|---|---|
6 | 540 | Movie starts | cond1_run-01.mp4 |
6 | 90 | Intact A | cond1_run-01.mp4 |
105 | 90 | Scramble Fix C | cond1_run-01.mp4 |
204 | 90 | Scramble Rnd B V1 | cond1_run-01.mp4 |
303 | 90 | Scramble Fix C | cond1_run-01.mp4 |
402 | 90 | Intact A | cond1_run-01.mp4 |
501 | 90 | Scramble Rnd B V2 | cond1_run-01.mp4 |
stim_onset/stim_offset - I guess could be added but would have redundant information which could be computed (and validated to not go beyond stimuli duration) from "Movie starts" for that stimuli and corresponding onset and duration. And we all know what happens when there is redundancy ;)
As for the hierarchical description of events -- isn't there https://bids-specification.readthedocs.io/en/latest/99-appendices/03-hed.html ? (never used it myself though)
I'm not crazy about either of the solutions proposed above because, while both compliant with the current spec, neither one eliminates the fundamental ambiguity here, which is that you don't know which part of the clip is being presented. It also is kind of problematic from a BIDS-StatsModel standpoint, because it will cause almost all users to have to drop a Filter
transformation into their model just to weed out the first row, since nobody is going to want that in their model.
The benefit of having optional stim_onset
and stim_offset
columns is those would eliminate the ambiguity in question without making most model specifications more complex. What I don't like about this proposal is that the extra columns are essentially metadata—there's virtually no situation under which they would be treated like other non-mandatory columns (i.e., as containing design-relevant information).
The more I think about this, the more I lean towards maybe keeping the current approach and not codifying this at all in the _events.tsv
files. Maybe the solution is to require a supplementary metadata file for the stimulus files that contains the onsets. I.e., cond1_run-01.mp4
would have to have a cond1_run-01.json
file that has fields PresentationOnset
and PresentationOffset
. But even that isn't sufficient, because presentation onset/offset can vary not just by stimulus, but also by event...
Should we just say this is in the 20% (really more like 1%) and not worry about it?
BTW, ... Do they actually would need to filter then out? Why don't you want them to model that entire "super" condition as well? If there are different movie cuts, you might want them explicitly in the model, even if only to absorb transition (if it visible) between different stimuli. If there is only one big one for the entire run - well, it will largely be your constant. If there design disbalance and stimuli files have subtle unique features to them (differently trimmed, color scheme, audio volume level), having them modeled might save us from one other possible retraction.
The only problem I see is if all the trials follow each other in such a way that model becomes degenerate if the whole stim file condition is present too. So, overall, it might be specific design related.
The only cons is that may be those stimuli onset and duration are actually of interest to other tools, not just the linear model, so they would need to recompute them as well. But it shouldn't be too hard.
As for extra unused meta data - I would say the more the merrier. My main concern is the fear of it being redundant and this requiring "manual" recomputation if I find that eg I need to fix onset. Then I will forget and the stimuli onset value will no longer be valid
I would agree this probably falls into the 1% as the majority of experiments don't have sub-conditions within a stimuli. And so in 90% of cases, the mention of a stimulus indicates a complete presentation, so this is such a rare situation its probably not worth putting in the spec itself.
I still think it might be worth clarifying that including a stimulus in stim_file
does not necessary indicate that the stimulus is played from the beginning (which is what I thought on first read).
i'm not sure this is 1%. in many standard experiments, there are sub conditions. for example in experiments that involve showing faces/objects there are often sub categories: emotions, types of objects, types of faces (human faces/animal faces). in fact the modified hariri task is a perfect example of this, and gets used by emotion/mood researchers a lot.
i don't think we should reinvent ontologies of stimuli (e.g., paradigms - https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3682219/, audio - https://research.google.com/audioset/ontology/index.html, images - https://bioportal.bioontology.org/ontologies/BIM).
but provide a way where stimulus properties can be encoded appropriately.
The more I think about this, the more I lean towards maybe keeping the current approach and not codifying this at all in the
_events.tsv
files. Maybe the solution is to require a supplementary metadata file for the stimulus files that contains the onsets. I.e.,cond1_run-01.mp4
would have to have acond1_run-01.json
file that has fieldsPresentationOnset
andPresentationOffset
.
i like the idea of a json going alongside a stimulus file, but this json should be able to reflect timed objects inside it.
just to follow up:
-
so in case of the hariri task, trial_type can represent the most dominant trial_type (faces, objects for example, or for mood researchers neutral/angry/etc). then stimulus properties could somehow represent not only details like cropping/full frame, colorspace, etc., but also ontological objects like this is an image/video of a face.
-
so in case of a movie, events.tsv could simply say i showed this clip for 240s. the stimulus event should have a json file that can encode many different types of extracted events within the clip.
-
another option is to allow multiple events files, and any model has to refer to a specific event file (may be we allow composition of event files).
@satra by "sub-conditions" here we're not talking about hierarchical organization, we're talking about a temporal subset of a single file. Codifying hierarchical structures is IMO not in scope, but in any case presents no particular challenge from an events.tsv
perspective, because you can just put the filename for each event in the stim_file
column, and the analyst is welcome to do whatever they want with that. The case we're talking about is where you have, say, an 8-minute movie file identified as the stim_file
, but the presentation starts halfway through that clip. In such cases the analyst needs to have some way to know that the onset of the presentation isn't synced with the onset of the event. But this seems like an edge case (indeed, I'm pretty sure this is the first BIDS dataset we've run into where it's an issue), so the proposal is to just let it be.
- then stimulus properties could somehow represent not only details like cropping/full frame, colorspace, etc.
I think this is analogous to the movie example, but I still think it's an edge case. Situations where researchers dynamically crop images are likely to be pretty rare; in most cases, the cropping will have been done in advance, and what's in the stimuli/
folder will be what was presented to the subject.
I think a reasonable way to update the spec is to strongly encourage users to provide files in stimuli/
that are as close as possible to the ones participants actually experienced. That means temporally or spatially cropping movies and images if needed. But I agree with @adelavega that we should also explicitly say that there is no actual guarantee that the contents of stimuli
map perfectly onto what participants experienced.
@tyarkoni - sorry i misunderstood the within stimuli conditions, so please ignore the ontological variations (although see last paragraph below).
for movie, i'm thinking of things like commercial clips that are shown, and i'm sure that certain clips cannot be shared.
for movies as an example are you saying i can extract faces, then specific emotions on those faces, and then encode both face and face+emotion in the events file, kind of a redundant stimulus list. all possible events in trial_type and then the analyst figures out which trials are of interest? for many of our tasks, that would work pretty well.
for movie, i'm thinking of things like commercial clips that are shown, and i'm sure that certain clips cannot be shared.
I don't know that we can do anything about this, short of asking people to provide a description of where/how to obtain stimuli that can't be publicly shared. I don't think it's worth trying to codify this—there's too much variability in what that procurement process could look like.
for movies as an example are you saying i can extract faces, then specific emotions on those faces, and then encode both face and face+emotion in the events file, kind of a redundant stimulus list.
Sure, you can create arbitrary columns in events.tsv
that code anything you like. Aside from stim_file
, you could add columns for face_id
, face_gender
, face_age
, face_emotion_rater1
, face_emotion_rater2
, face_emotion_avg
, and anything else you like. The expectation is that you then put descriptions of columns in the data dictionary in the JSON sidecar, though I believe this is non-mandatory right now.
This is an old one.
I wonder if HED tags can help with such issue. @VisLab do you have some opinion on this?
As it turns out the HED Working Group has been discussing this very issue and some of our members will weigh in shortly with a concrete proposal --- @neuromechanist @dorahermes @tpatpa @dungscout96 @monique2208 @makeig
Yes, I agree that HED tags can come in useful here and probably tackle this issue. When working through an example it seems like this may be a relatively larger contribution with some added machine readable files in the /stimuli/
folder. When starting to work a visual images and movie example with @neuromechanist it seems that there would be a need for community input for review and other examples such as e.g. auditory, motor, electrical stimulation, etc as well. This seems to perhaps go to the scope of a potential BEP. Should we open a separate GitHub issue to discuss whether to open a BEP or continue here?
@neuromechanist could share a preliminary google doc (not BEP yet, just the examples we were working through) if that would help give an idea?
Tagging some people who previously contributed to this discussion for input: @adelavega @tyarkoni @yarikoptic @satra @Remi-Gau
if we are talking about a BEP to help organize stimuli then there is overlap with : https://github.com/bids-standard/bids-specification/issues/751
Reading here and #751 resonates closely with the challenges we are exploring for including image and movie annotations into a couple of massive datasets we are working on. @dorahermes, and @tpatpa are working on the annotation of the Natural Scene Dataset, and @smakeig, @dungscout96, and I are working toward Healthy Brain Network's movie annotation.
In both projects, we see the need for top-level annotation files that would be used in the downstream *_events.tsv
.
In this Google Doc, we are exploring the possibility of a file such as stimuli/stimuli.tsv
to hold a list of the stimulus files and possible annotations (stimuli/stimuli.tsv
is very similar to stims.tsv
discussed in #751).
A sample stimuli.tsv
file would look like this:
stim_file | type | NSD_id | COCO_id | first_COCO_description | HED |
---|---|---|---|---|---|
nsd02951.png | still_image | 2951 | 262145 | “an open market full of people and piles of vegetables.” | ((Item-count, High), Ingestible-object)), (Background-view, ((Human, Body, Agent-trait/Adult), Outdoors, Furnishing, Natural-feature/Sky, Urban, Man-made-object))" |
If the stimulus file has a time-varying context (such as a movie), a separate *_stimulus.tsv
will hold the annotations. The structure of *_stimulus.tsv
would be very similar to *_events.tsv
with onset
, duration
fields, etc.
In any case, including the stim_file
name in the *_events.tsv
's stim_file
column would link the task events (*_events.tsv
) and stimulus annotation (stimuli.tsv
and *_stimulus.tsv
).
We believe this method will make the annotation of stimulus files more reusable; researchers can reuse the stimulus files and select the stimuli.tsv
rows (and *_stimulus.tsv
files) of their choice for their new studies.
Also, reusing the dataset with alternate annotations for the same stimulus files would be as straightforward as adding a column to *_stimulus.tsv
or replacing the whole file with a new one.
We appreciate your thoughts and comments on the Google Doc, as well as here. Our use cases are limited to a couple of visual and audiovisual stimuli. Many other stimulation types may require other arrangements. We appreciate that you also include examples of other stimulus types, if possible.
@bids-standard/maintainers would be great to hear your thoughts on whether this is worthy of a small BEP, thank you!
Maybe not a BEP but several small orthogonal pull requests?
I can try to bring it up at the next maintainers meeting.
Following https://github.com/hed-standard/hed-python/issues/810, it seems that expanding the _events.tsv
files, with what was called subconditions in the first post of the issue, is a remodeler issue. Nevertheless, the remodeler would require rules and guidelines to remodel the _events.tsv
with the contents of the stimuli/
directory.
As described in the HED issue above and also in the GDoc we are drafting for this issue, there could be two variations of this issue:
- Column-only extension for still stimuli, so that only specific columns (and annotations) would be added to the
_events.tsv
. - Row extension with the possibility of column extension, in which the contents of a specific stimulus file will be merged with the contents of the
_events.tsv
.
A working example for the second case, which is the main focus of this issue, is the following scenario: In the CMI Healthy Brain Network project, subjects watch the Present movie during fMRI and EEG sessions, among other tasks (see a sample of the EEG-BIDS dataset).
The events for the Present movie are limited to the start and stop of the video:
onset | duration | sample | value | event_code |
---|---|---|---|---|
0.000 | 0.002 | 0 | 9999 | 9999 |
2.034 | 0.002 | 1017 | video_start | 84 |
205.098 | 0.002 | 102549 | video_stop | 104 |
However, it is clear that a movie contains far more events, and researchers would desire to provide their annotations based on their application. As a straightforward example, we identified the shot transition events and quantified the Log Luminance Ratio of this shot transition. The file included in the dataset as stimuli/the_present_stimulus-LogLumRatio.tsv
:
onset | duration | shot_number | LLR |
---|---|---|---|
0 | n/a | video_start | video_start |
0 | 7.25 | 1 | n/a |
7.25 | 3.542 | 2 | -1.557820733 |
10.792 | 5.208 | 3 | 0.3358234903 |
16 | 5 | 4 | -0.03306866929 |
21 | 4.208 | 5 | -0.2070276568 |
... | ... | ... | ... |
165.25 | 6.667 | 55 | -0.2270603551 |
171.917 | 31.292 | 56 | 0.1188704433 |
203.208 | n/a | video_stop | video_stop |
To merge the _stimulus.tsv
into the _events.tsv
after the initial import process (i.e., remodeling the events table) into EEGLAB, I have made a function that:
- gets the
EEG
structure, the_stimulus.tsv
, and the names of the columns for extension, - finds the common event names (here,
video_start
andvideo_stop
) between thevalue
column and the mentioned columns for extension, - compares/corrects the timelines of the common events,
- merges the events of the
_stimulus.tsv
- recreates
EEG.event
structure
This implementation is far from perfect, but it could serve as a working example of the implications of this mechanism for large and very large datasets. The Healthy Brain Network Project spans over 7000 subjects with EEG and fMRI, and this mechanism will help dynamically use event annotations based on the research's use case.
I haven't had time to look at the entire proposal in detail, but overall the concept of annotating stimuli seperately from the _events.tsv
file seems like a reasonable proposal, as it allows for the inclusion of detailed stimuli annotations, without fundumentally changing the way _events.tsv
works
Following 4/12's conversations with @Remi-Gau, @adelavega, @yarikoptic, @arnodelorme and @dungscout96, there is quite an enthusiasm to provide structure for the stimuli/
directory.
@yarikoptic and I jotted on the Google Doc to modify the suggestions to a (directory-less) BIDS naming structure, which also follows the ideas in #751.
Based on the Google Doc example, here is a draft suggestion:
stim-present_???.mp4|mkv|jpg|png
stim-present_???.json
[stim-present_annot-loglum_events.tsv]
[stim-present_annot-loglum_events.json]
…
stimuli.tsv
stimuli.json
- The
stim-
prefix distinguishes the files from thesub-
files, indicating that the stimulus files are independent/seperate of the subjects. The???
suffix follows the common principles rule but needs to be decided. - The
events.tsv
file accommodates annotating time-varying stimulus files (that is, within-stimuli conditions). - The
annot-
provides the opportunity to have different annotations (andevents.tsv
files) per single stimulus file. - Similar to
participants.tsv
, thestimuli.tsv
contains a list of the stimulus files, with optional columns. - Similar to
participant_id
, astim_id
points to unique stimulus files. It is up to the user/tools to decide which annotations should be used for the respectivestim_id
.
TODO:
- [x] PR to add
annot
andstim
entities. - [x] Decide on the
???
suffix. (media?!) - [x] Create an example
stimuli/
directory with the suggested structure. - [ ] PR to suggest the
stimuli/
directory structure (potentially as a continuation of the #751 BEP, a new BEP, or an ENH).
CC @VisLab, @dorahermes, and @monique2208 for comment.
Looks good, but I'm concerned that mandating stimuli have a specific name would make this backwards incompatible w/ existing datasets (which name stimuli files whatever they want, and just refer to them in the _events.tsv
files)
It's a minor concern, but it just seems slightly out of scope to mandate a new way to name stimuli files. Would this required overall even if you do not have annotations?
Seems like there was discussion regarding the top level stim-
prefix here: https://github.com/bids-standard/bids-specification/issues/751
Looks good, but I'm concerned that mandating stimuli have a specific name would make this backwards incompatible w/ existing datasets (which name stimuli files whatever they want, and just refer to them in the
_events.tsv
files)
Not sure the proposal has to be backwards incompatible:
Now: events.tsv
with stim_file
column value xxx/yyy.zzz
implies a file in ./stimuli/xxx/yyy.zzz
.
Potential proposal: the above stays the same... but...
In the ./stimuli/stimuli.tsv
file, the row for this file has first column value: ./stimuli/xxx/yyy.zzz
and other columns can appear as defined in ./stimuli/stimuli.json
file.
Suppose that the stimulus file is a movie with annotations then in ./stimuli/xxx
directory there can be a yyy_arbitrarystuff_annot.tsv
and yyy_arbitrarystuff_annot.json
that are intrepreted as annotations for yyy.zzz
. (Multiple raters may be available.)
The directory structure within the ./stimuli
folder can be arbitrary as it is now.
Current contenders for the stimuli modality suffix include:
-
_stimulus
(example:stim-the-present_stimulus.mp4
) -
_media
(example:stim-the-present_media.mp4
) -
_stream
(example:stim-the-present_stream.mp4
)
Feel free to let me know if you have any other suggestions and which one you prefer, so I can update the list.
_stimulus
seems oddly redundant with the stim-
prefix, otherwise I slightly prefer _media
but have no strong opinions.
In the spirit of the future BIDS 2.0 with e.g.
- https://github.com/bids-standard/bids-2-devel/issues/54 - we can have
stim-
as long as we have such an entity - and in the spirit of https://github.com/bids-standard/bids-2-devel/issues/59 we can already in BIDS 1.x state that
stimuli/
must follow some new convention only if there isstimuli/dataset_description.json
thus leaving original datasets with arbitrarystimuli/
as kosher. Not sure yet how well schema/bids-validator people would react to that ;-)
Ok,sounds great. It seems that proposing stim
and annot
entities have a good support. I'll make a pull request for them.
The suffix may need more consideration. Currently, _media
seems to have more appeal.
Just a note that there is already _stim
suffix for individual stimulus files defined under physio
data type. But, I believe that these two use cases have little relation to each other.
Also, should we convert this issue to a BEP? Converting to BEP hopefully makes the enhancements more visible and maintainable (although, it will also require more work).
Talking to @yarikoptic and @dorahermes, they both seem to support a BEP for this issue.
Added PR #1814 to add stimulus and annotation entities and the stim_id
column.
The next steps would require inputs for:
- [ ] suffix: (
_media
) - [ ] multi-track stimuli (choose one of the currently available
part-
,chunk-
orsplit-
entities) - [ ] Contents of the stimuli.tsv
- [ ] Decision if all should be a series of PRs or a consolidated BEP
It would be great to have this formalized! We have a large number of datasets where we present the same short movie as a localizer. Having one general annotation file which could apply to all of these datasets would really help with the analysis, it would remove a lot of redundancy in the event files and and I think it would provide something interesting to share on its own.