bids-examples Example of the HED schema library for SCORE implementation

This PR includes examples of the HED schema library for the Standardized Computer-based Organized Reporting of EEG (SCORE) implementation in several EEG and invasive EEG datasets in the two ways to annotate HED in BIDS.

BIDS dataset was validated with [email protected] and HED annotations were separately validated with the example jupyter notebook and the hedtools that will be incorporated in the bids-validator soon.

Jun 26 '22 19:06 tpatpa

Before this example can be merged, this other pull request in the bids-specificartion that adds the use of an HED schema library bids-specification pull request needs to be merged first. Then the dataset_description.json can be updated to include the SCORE library schema.

Jul 08 '22 18:07 dorahermes

crosslinking, this PR needs a spec PR and a following validator PR to be merged first:

spec: https://github.com/bids-standard/bids-specification/pull/1106
validator: https://github.com/bids-standard/bids-validator/pull/1496/

Jul 22 '22 05:07 sappelhoff

The PRs mentioned in https://github.com/bids-standard/bids-examples/pull/324#issuecomment-1192212450 have meanwhile been merged, so I think you can address some of the reviewer comments above @tpatpa

Sep 27 '22 09:09 sappelhoff

I made minor edits to match the latest HED-SCORE library schema. Validation is still failing because the validator is validating against a previous version of the HED-SCORE library schema. We are in the final stages before releasing the schema's latest version. Once released this will be validated and could be merged as well.

Sep 27 '22 22:09 tpatpa

We just released HED-SCORE V1.0.0. If you follow the link make sure to click on "Show another version" to browse. The remaining question for this BIDS-HED-SCORE example is on how to list a set of channels affected by an event. It seems like the discussion in the electrophysiology derivatives was going for a Pythonic list of strings, eg ['C3','F3'] or ['C3-F3','P3-C3'] in a separate channels column. Any preference for these examples? @sappelhoff @robertoostenveld @VisLab @guiomar @CPernet

Feb 08 '23 21:02 dorahermes

congrats Dora, Tal, et al!

It seems like the discussion in the electrophysiology derivatives was going for a Pythonic list of strings

yes, this should be a link to the specific discussion thread: gdoc discussion

My preference is indeed the "pythonic list of strings" (clarified more in the above linked thread) but apart from Robert and me, nobody seems to have offered an opinion yet.

Feb 09 '23 09:02 sappelhoff

+1 list of strings

Feb 09 '23 09:02 CPernet

It seems like the discussion in the electrophysiology derivatives was going for a Pythonic list of strings, eg ['C3','F3'] or ['C3-F3','P3-C3'] in a separate channels column.

Could you clarify what would be meant by: ['C3-F3','P3-C3']? If it means a group of channels as ordered by the channels.tsv, I don't think this should be allowed. Which channels are in the data and how they are ordered can change during computation, so this is particularly risky for derivatives. The same comment applies to channel numbers.

Also, given that quotes are usually not used in events files, would [C3, F3] be preferred?

One further option, which perhaps the HED working group should discuss separately, is how HED might be used to define named groups of channels which could then be referred to by name. This does not preclude the need for deciding on a format for a channels column in the events files and their derivatives.

Feb 09 '23 12:02 VisLab

Let me copy the discussion thread from the Gdoc here, perhaps that clarifies some things:

@sappelhoff :

I think a "pythonic" way to specify channels would be nice.

A list of strings where each string is an affected channel

An empty list means no channel is affected

"n/a" means the interpretation of "channel" is not applicable or the data is not available

We could additionally think about a shorthand to indicate "all channels are affected". Perhaps the string "all" would work for that ... if by chance a channel is named "all" this would still not be ambiguous, because specifying individual channels needs to be in a list like ["all"]

@robertoostenveld :

I don't care too mush, but think that we should not be restrictive in formatting. And to note: channels in the annotations file do not have to correspond to channels in the data. E.g., how would you document an interictal spike that is observed in a bipolar refreferenced channel ("C3-F3"), whereas the data is stored and shared with a common reference ("C3-REF", or probably just "C3"). Annotations could also refer to electrodes rather than channels, such as "video inspection revealed that electrode C3 fell off at 10 minutes into the recording".

@sappelhoff:

Thanks for these valid examples. Regarding the first one, one could argue that if an interictal spike at "C3-F3" is observed, and the dataset curator wants to annotate this event, then they should include that channel in the data.

Regarding the "electrode" example: We could have an additional column called "electrode" that follows the same formatting as "channel" (the column we are discussing here), but strictly referring to electrodes.

Curators could then for each event specify any number of channels and/or electrodes that are related to the event.

Feb 09 '23 12:02 sappelhoff

@sappelhoff :

I think a "pythonic" way to specify channels would be nice.

A list of strings where each string is an affected channel

An empty list means no channel is affected

"n/a" means the interpretation of "channel" is not applicable or the data is not available

We could additionally think about a shorthand to indicate "all channels are affected". Perhaps the string "all" would work for that ... if by chance a channel is named "all" this would still not be ambiguous, because specifying individual channels needs to be in a list like ["all"]

I am in favor of this format.

I realize that this discussion pertains to derivatives, however, it might also be relevant to events in the main BIDS specification.

Right now the BIDS specification of events.json indicates two options for describing column values (discounting HED): Levels and Units.

If Levels are given, the column is assumed to contain categorical values.

If Units are given, the column is assumed to contain numeric values expressed in the specified units.

Would this new format of column value require a new List type -- list of numbers or list of strings?

Feb 09 '23 13:02 VisLab

Right now the BIDS specification of events.json indicates two options for describing column values (discounting HED): Levels and Units.

Maybe we can use IntendedFor in the corresponding JSON file to link to the relevant channels file similar to how fieldmap data is linked to a specific scan?

Feb 15 '23 15:02 tpatpa

One further option, which perhaps the HED working group should discuss separately, is how HED might be used to define named groups of channels which could then be referred to by name. This does not preclude the need for deciding on a format for a channels column in the events files and their derivatives.

I agree that developing an "HED way" to deal with this problem would be very nice in addition to a "straight forward - non HED" BIDS solution.

Right now the BIDS specification of events.json indicates two options for describing column values (discounting HED): Levels and Units.

you are raising a very important point here, thanks -- I have not considered that my proposal would introduce a new "data type" for rows in TSV files apart from those defined by Levels and Units. It wouldn't make sense to consider the lists of strings (or the [], "all", and "n/a" values that'd be possible) as "Levels", because there'd simply be too many levels without providing a lot of insight.

We should:

think about alternative solutions
think whether introducing a List type makes sense in a more general sense (beyond our present problem), because I would be hesitant to introduce a concept that is only sensible / meaningful for one particular application
beyond the potential List data type, the present proposal would also permit "all" as a shorthand for a full list of channels and the standard "n/a". While permitting "n/a" is easy and needed anyhow, ... what do you think of the "all" shorthand? Needed? Superfluous? ...

In any case, we need some more people to chime in :-)

Maybe we can use IntendedFor in the corresponding JSON file to link to the relevant channels file similar to how fieldmap data is linked to a specific scan?

Not all JSON files have an IntendedFor metadata field, but I like the general direction here ...

Feb 15 '23 16:02 sappelhoff

One other possibility would be just to use a string for the channel list rather than an actual list. Then BIDS doesn't have to be concerned about validating necessarily, although it could.

"[Cz7, O1, O2]"

Downstream tools could then do their own thing to interpret a channels column, parse, and make sure that these are real channels.

Feb 23 '23 21:02 VisLab

Right now the BIDS specification of events.json indicates two options for describing column values (discounting HED): Levels and Units.

you are raising a very important point here, thanks -- I have not considered that my proposal would introduce a new "data type" for rows in TSV files apart from those defined by Levels and Units. It wouldn't make sense to consider the lists of strings (or the [], "all", and "n/a" values that'd be possible) as "Levels", because there'd simply be too many levels without providing a lot of insight.

We should:

think about alternative solutions

think whether introducing a List type makes sense in a more general sense (beyond our present problem), because I would be hesitant to introduce a concept that is only sensible / meaningful for one particular application

beyond the potential List data type, the present proposal would also permit "all" as a shorthand for a full list of channels and the standard "n/a". While permitting "n/a" is easy and needed anyhow, ... what do you think of the "all" shorthand? Needed? Superfluous? ...

In any case, we need some more people to chime in :-)

I think a List data type would help. This may also be helpful beyond the present problem for other ephys cases where multiple channels or electrodes could be listed and I can imaging other cases where multiple items may to be indexed beyond channels or electrodes.

Maybe we can use IntendedFor in the corresponding JSON file to link to the relevant channels file similar to how fieldmap data is linked to a specific scan?

Not all JSON files have an IntendedFor metadata field, but I like the general direction here ...

This may be similar or comparable to a case in the BIDS-connectivity-BEP, for channel-to-channel connectivity metrics where we also want to interpret a column with respect to an _channels.tsv or _electrodes.tsv file (discussing there whether this should be a sourceAtlas, but it is not an actual atlas...).

Feb 28 '23 20:02 dorahermes

For the format of this field a "pythonic list" has the downside of encouraging developers to do an eval which is not a safe practice. A comma separated list (or a single item) would work effectively as a representation. Brackets aren't strictly necessary as these files are a TSV.

CCing @christinerogers for EEGnet

Mar 03 '23 18:03 Andesha

For the format of this field a "pythonic list" has the downside of encouraging developers to do an eval which is not a safe practice. A comma separated list (or a single item) would work effectively as a representation. Brackets aren't strictly necessary as these files are a TSV.

I agree with you -- I guess I proposed it this way to distinguish n/a from a potential channel name n/a and to have a convenient shorthand all that is different from a channel potentially called all. But we can probably come up with better ways.

Mar 03 '23 22:03 sappelhoff

Super, thanks @sappelhoff -- I'll raise this at the BIDS maintainers meeting, since @Andesha 's request for code-agnostic (consistent, comma-separated instead of pythonic) lists helps those making BIDS-supporting data platforms (like EEGNet which is webfacing) across modalities.

Has this been discussed elsewhere in the BIDS specs, i.e. is there an implementation cost concern? cc @CPernet

Does this make sense to you @arnodelorme @VisLab ? thoughts, @dorahermes @tpatpa ?

Mar 09 '23 16:03 christinerogers

Discussing this with a few maintainers we settled on allowing json arrays whose only values are strings as an acceptable solution. The json specification allows json documents to start with a square bracket so any languages implementation of the json parser should be sufficient.

We should also be able to represent this in the schema well enough to tip the validators off that elements of the column may need to be parsed.

@effigies did I misrepresent anything here?

Mar 09 '23 20:03 rwblair

Hi @rwblair , My takeaway from this maintainers meeting was that it wasn't decided, and will be raised again next time after some community consultation (@sappelhoff). There's definitely concern about the embedded JSON suggestion on our side -- e.g. within a TSV it's not only hard to read but also can't be validated easily by those preparing data files, so significantly more error-prone.

Mar 09 '23 20:03 christinerogers

@rwblair nothing misrepresented, but here's a more extended version:

We discussed two possible approaches:

Comma-separated values: a,b,c
JSON arrays: ["a","b","c"]

Advantages to CSV:

Simple. Just use split(val, ","), which will come in every language and is easy to write your own if forced.
You can eyeball it.

Advantages to JSON arrays:

Typed. The schema could declare a value as having type array[str] (actually something uglier, but you get it) and the validator can check it.
Proximal. If a tool is working with BIDS data, it should have access to a JSON parser. No eval needed.
Escapable. JSON supports unicode escapes, in case someone needs a tab or newline character inside their strings, which TSV with a CSV insert would interpret as an end-of-column or end-of-row. (I hope people don't do this, but in the extreme...)

Mar 09 '23 21:03 effigies

Hi @rwblair , My takeaway from this maintainers meeting was that it wasn't decided, and will be raised again next time after some community consultation (@sappelhoff). There's definitely concern about the embedded JSON suggestion on our side -- e.g. within a TSV it's not only hard to read but also can't be validated easily by those preparing data files, so significantly more error-prone.

My apologies for the ambiguity and finality in tone of my post. This was from a chat after the maintainers meeting you presented at. I completely agree with you that it needs more discussion here and in the next maintainers meeting.

Mar 09 '23 22:03 rwblair

thanks @rwblair - let's continue this detailed point in a more general/accessible place than this channel. cc @sappelhoff happy to follow your lead here.

Mar 10 '23 18:03 christinerogers

Let's continue the discussion in this dedicated issue, where I summarized the present situation:

https://github.com/bids-standard/bids-specification/issues/1446

this present PR is

buried in the examples repo
intended to achieve something else

Mar 16 '23 12:03 sappelhoff

Per discussion during the bids-derivatives meeting, bipolar channels in this case remain in the raw data (there is no previous BIDS dataset, so therefore these data are not derived). The current channel names (e.g. FP1-F7) in _channels.tsv is therefore correct.

unrelated note - the dataset_description can contain SourceDatasets which point to the TUH corpus (which is non-BIDS).

Jun 22 '23 15:06 dorahermes

Looks good....

Aug 07 '23 11:08 VisLab

Looks good to me, ok after it is validated!

Aug 14 '23 19:08 dorahermes

@tpatpa I rebased your branch on bids-standard/bids-examples@master. You will have to run the following command locally before you continue to work on this: git fetch --all and then git reset --hard origin/master

NOTE if you have uncommitted local changes on your branch, do not run the above ☝️ ... just comment here in that case :-)

And in the future, it would be good if you could make a new git branch before submitting a pull request. Submitting pull requests directly from your master branch makes it difficult for you to stay up to date with the upstream repository (bids-standard/bids-examples@master).

Oct 27 '23 08:10 sappelhoff

We're having a chicken and egg problem here... The HED version must be a released version, but SCORE 1.1.1 has not been released yet due to unresolved issues. @dorahermes can you review 1.1.1 and we can discuss the issues offline and move towards release.

Oct 27 '23 12:10 VisLab