bids-specification icon indicating copy to clipboard operation
bids-specification copied to clipboard

[ENH] extension for electromyography (EMG) - BEP042

Open drammock opened this issue 11 months ago • 26 comments

This is a very early WIP implementation to add EMG support. CIs are not expected to pass yet.

cc @neuromechanist @jwelzel @larsoner @arnodelorme @robertoostenveld feel free to push directly to this branch, I'll add you as repo collaborators on my fork

[!Note]

We meet regularly to discuss this BEP

Next meeting: 18 Dec 2024 on https://ucsd.zoom.us/j/96433382377

Communication channel on github repo / matrix / slack / discord : https://github.com/bids-standard/bids-specification/issues/1371

drammock avatar Dec 06 '24 23:12 drammock

cc @agramfort

drammock avatar Dec 18 '24 17:12 drammock

Hi, @neuromechanist pointed me to this PR and I would like to share some thoughts. This seems to be pretty advanced in terms of sensor placement description which was not very well defined in the motion BEP :)

  • .json EMGPlacementScheme field : could be more restrictive with keywords? For instance in case of absence of a common process one MUST write "channel-specific". Keywords "visual reference", "palpation", "functional localization" ... can be explicitly recommended rather than having people use different keywords for describing the same thing (e.g., "visual inspection", "pressing on the skin"... ). They may even use multiple of those methods at the same time and in that case they can separate them with some designated delimiter (that can be prescribed too) for easy parsing. This depends of course on how well-categorized these processes are but since you are allowing unprescribed keywords for names of external schemes anyway (like SENIAM) it would be okay to not be comprehensive.

  • In the example on the website draft I read "EMGPlacementScheme": "midpoint between cubital fossa and radial styloid process", : this seems to contradict the description that says NOT to give the target muscle description

  • .json EMGReference : similarly to EMGPlacementScheme field, you may simply have them choose between 1) a specific name, 2) keyword "channel-specific", or 3) "bipolar". Mix of bipolar and other references would then be a case of "channel-specific".

  • .json SkinPreparation : might this be channel-specific as well? For instance in EEG we would use the abrasive gel only for EOG and not for other electrodes. Then having this as a column in channels.tsv with description of keywords in channels.json can be helpful

sjeung avatar Feb 26 '25 15:02 sjeung

Hi @sjeung, thanks for the feedback / ideas.

  • .json EMGPlacementScheme field : could be more restrictive with keywords?

done in e84cadc

In the example on the website draft I read "EMGPlacementScheme": "midpoint between cubital fossa and radial styloid process", : this seems to contradict the description that says NOT to give the target muscle description

Those are skeletal landmarks, not muscles. But we've reworked EMGPlacementScheme to be an enum now, so that example will need to change anyway.

.json EMGReference : similarly to EMGPlacementScheme field, you may simply have them choose between 1) a specific name, 2) keyword "channel-specific", or 3) "bipolar". Mix of bipolar and other references would then be a case of "channel-specific".

This was the intent, perhaps it's just not worded clearly enough? Suggestions for clarification are welcome.

.json SkinPreparation : might this be channel-specific as well?

For EEG, I think abrasive gel isn't used because of possible damage to hair. According to @neuromechanist it would be odd to use a different skin prep for different EMG sites in the same session, so we'll probably leave this as as-is.

drammock avatar Feb 26 '25 23:02 drammock

I think this BEP is ready for a thorough review by the team: @robertoostenveld @larsoner @neuromechanist @arnodelorme @JuliusWelzel @tjeerdboonstra

cc @agramfort

drammock avatar Feb 28 '25 23:02 drammock

The document looks comprehensive https://bids-specification--1998.org.readthedocs.build/en/1998/modality-specific-files/electromyography.html. A few comments:

  • Sampling Frequency Specification: The sampling frequency is expected to be the same for all electrodes right?

  • "EMGPlacementScheme" is set to "midpoint between cubital fossa and radial styloid process" in the example, but the specification says it should be "Measured", "Other", or "ChannelSpecific"

  • For channel.tsv, maybe "reference" should be "reference_electrode" to mirror the column "signal_electrode" and make it clearer to users

  • EMGCoordinateSystem must be one of "Others" (or maybe the other keywords are missing). This is not accurate since the other coordinate systems seem allowed https://bids-specification--1998.org.readthedocs.build/en/1998/appendices/coordinate-systems.html

arnodelorme avatar Mar 10 '25 22:03 arnodelorme

  • Sampling Frequency Specification: The sampling frequency is expected to be the same for all electrodes right?

In the majority of cases yes. But not necessarily, if there are e.g. some grid devices and some bipolar devices at different spots on the body, recording into separate amplifiers / data files but acquired simultaneously. These would get different values for the acq- entity.

  • "EMGPlacementScheme" is set to "midpoint between cubital fossa and radial styloid process" in the example, but the specification says it should be "Measured", "Other", or "ChannelSpecific"

good catch. That should be Other and the text should be in EMGPlacementSchemeDescription --- which is missing from the list of *_emg.json fields.

  • For channel.tsv, maybe "reference" should be "reference_electrode" to mirror the column "signal_electrode" and make it clearer to users

I went back and forth on that question. reference already exists as a defined column for EEG datasets, so it was easier / more consistent to re-use it... but I agree that it would be good if the two column names were more parallel. I think @neuromechanist and I agreed that calling the other column just "signal" was too ambiguous, so maybe calling it reference_electrode (and thus breaking the similarity with other modalities) is the best way forward.

  • EMGCoordinateSystem must be one of "Others" (or maybe the other keywords are missing). This is not accurate since the other coordinate systems seem allowed https://bids-specification--1998.org.readthedocs.build/en/1998/appendices/coordinate-systems.html

I think this is actually correct as-is. Other coordinate systems are allowed for other modalities, but we're making the assumption that things like CTF, NeuromagElektaMEGIN, CapTrak, etc are not relevant for the vast majority of EMG datasets, and that coordinate systems for EMG datasets will almost always be "custom" (AKA, will define their own origin and XYZ directions, based on anatomical landmarks not on the skull).

drammock avatar Mar 11 '25 22:03 drammock

Hello,

thanks very much for the great progress! I have some remarks listed below:

  1. The BIDS definition for the acq-label reads as follows:

"Definition: The acq-

In EMG BIDS the acq-label is used to differentiate between recording systems, not parameters. While I do understand it is simple to use, maybe BIDS in general could introduce a sys label for filenames for this purpose. For motion data we introduced a new tracksys label for the same use case. Probably should have named that sys :D

Also the acq-label is explained again with the coordinate_system.json. I would move the explanation further up.

  1. Provide some more information how the acq_time is to be formated (e.g. similar to MOTION-BIDS):

In the scans.tsv file, date-time information MUST be expressed as indicated in Units, which allows to use sub milisecond presicion.

  1. In the description of the sensor locations, are the example structures a MUST? E.g. if landmarks are digitized with a Polhemus ... coordinates of an electrode MUST be given with x,y,z coordinates?

  2. The table of Hardware information has a nan row at the 5th position

  3. The example *_emg.json is not a valid .json file due to the last comma (after "jumping").

  4. For the channel description it is stated: "Channels SHOULD appear in the table in the same order they do in the EMG data file". Are headers a MUST ind the EMG data file? If not how can the channels be matched, if not by order. I would propose to make this a MUST.

  5. The restricted keyword list for the channels.json seems a little counterintuative. EMG is, at least to my limited knowledge, not often sampled through the same amplifier as Eye-Tracking. I would exlcude EYEGAZE and PUPIL from this list. Probably include POS channels from motion data, as e.g. Vicon offers EMG integration with optical motion capture.

JuliusWelzel avatar Mar 13 '25 15:03 JuliusWelzel

  1. The BIDS definition for the acq-label reads as follows [...] In EMG BIDS the acq-label is used to differentiate between recording systems, not parameters. While I do understand it is simple to use, maybe BIDS in general could introduce a sys label for filenames for this purpose.

That is an attractive idea. You are right that we chose acq mostly for convenience. It would also be possible to expand the definition of acq (e.g., "different set of parameters or devices used for acquiring the same modality").

Also the acq-label is explained again with the coordinate_system.json. I would move the explanation further up.

it is explained at the end of the initial "EMG Data" section (just before the "Terminology: electrodes vs channels" subsection). It comes up again when discussing coordsystems, and then again when discussing photos. I couldn't see a good way to avoid talking about it in multiple places. In light of that, do you still think it needs to move / change?

  1. Provide some more information how the acq_time is to be formated (e.g. similar to MOTION-BIDS)

Are you specifically asking to add the "sub-millisecond precision" bit? (if so, no objection). If not, can you clarify what you think is lacking here?

  1. In the description of the sensor locations, are the example structures a MUST? E.g. if landmarks are digitized with a Polhemus ... coordinates of an electrode MUST be given with x,y,z coordinates?

what is MUST, SHOULD, or MAY is open to discussion. There are also likely some more rules to be added, e.g., to make some optional fields required depending on the values in other fields. Regarding specifically the Polhemus case, I would agree that digitized locations MUST include x,y,z based on my experience using Polhemus for digitizing EEG electrode locations. Is there a case where one would use a Polhemus (or similar spatial digitizer) and not provide coordinates in 3D?

  1. The table of Hardware information has a nan row at the 5th position

thanks, fixed. It was asking for AmplifierType, which wasn't defined elsewhere. IIRC we decided that wasn't needed, but I can add it back in if folks disagree.

  1. The example *_emg.json is not a valid .json file due to the last comma (after "jumping").

thanks, fixed.

  1. For the channel description it is stated: "Channels SHOULD appear in the table in the same order they do in the EMG data file". Are headers a MUST ind the EMG data file? If not how can the channels be matched, if not by order. I would propose to make this a MUST.

EDF/BDF necessarily have channel names in the file (which I think is what you mean by "headers" right?). There are also guidelines on what the format of such channel names should look like (modality-space-identifier, i.e., EEG Cz or MEG 1441 or EMG 002). I suppose it would be conceivable to have an EDF/BDF file where the channel names were non-unique (which IMO would be a degenerate case), but I don't think they can be missing.

  1. The restricted keyword list for the channels.json seems a little counterintuative. EMG is, at least to my limited knowledge, not often sampled through the same amplifier as Eye-Tracking. I would exlcude EYEGAZE and PUPIL from this list. Probably include POS channels from motion data, as e.g. Vicon offers EMG integration with optical motion capture.

This was originally copy-pasted from EEG, then pruned. I agree it needs refinement... I had a code comment in there for a while saying as much, until I realized almost nobody was reading the source :) For now I'll remove PUPIL, EYEGAZE, ADC, DAC, and OTHER, and add POS.

drammock avatar Mar 13 '25 17:03 drammock

Thanks, @JuliusWelzel, very insightful comments,

  1. Re acq-<label>, I am in favor of expanding the definition of acquisition mostly to avoid introducing yet another entity to BIDS. Also, it might be good to consider recording-<label>, which is defined entity although its definition needs to be expanded:

This entity is commonly applied when continuous recordings have different sampling frequencies or start times. For example, physiological recordings with different sampling frequencies may be distinguished using labels like recording-100Hz and recording-500Hz.

IMHO, acq-<label> is more meaningful as it would indicate separate acquisitions.

  1. acq_time in the scans_tsv is quite clear IMO. We briefly discussed accommodating a LATENCY channel, if the data has multiple recordings. Probably, we should add it to the list of reserved channel types? Here is the description of the LATENCY channel:

LATENCY | Latency of samples in seconds from recording onset (see acq_time column of the respective *_scans.tsv file). MUST be in form of s[.000000], where s reflects whole seconds, and .000000 reflects OPTIONAL fractional seconds.

And the description of how to use it:

In case a tracking system provides time information with every recorded sample, these times information MAY be stored in form of latencies to recording onset (first sample) in the *_motion.tsv file. If a system has uneven sampling rate behavior, the LATENCY channel can be used to share this information.

  1. +1 that the relation between the data and channels should be a MUST, this is also problematic for the relation between channels and electrodes (see the detailed discussion here: #2041).

neuromechanist avatar Mar 13 '25 18:03 neuromechanist

it is explained at the end of the initial "EMG Data" section (just before the "Terminology: electrodes vs channels" subsection). It comes up again when discussing coordsystems, and then again when discussing photos. I couldn't see a good way to avoid talking about it in multiple places. In light of that, do you still think it needs to move / change?

Good point, I think it can stay as it is. Maybe it is worth adding a detailed explanation for the reasoning in the paper.

Are you specifically asking to add the "sub-millisecond precision" bit? (if so, no objection). If not, can you clarify what you think is lacking here?

Yes, sub-millisecond presicion is imo worth mentioning as EMG usually has a high srate. This time resolution is important for good syncronization with other modalities.

what is MUST, SHOULD, or MAY is open to discussion. There are also likely some more rules to be added, e.g., to make some optional fields required depending on the values in other fields. Regarding specifically the Polhemus case, I would agree that digitized locations MUST include x,y,z based on my experience using Polhemus for digitizing EEG electrode locations. Is there a case where one would use a Polhemus (or similar spatial digitizer) and not provide coordinates in 3D?

I am not aware of any case where it is not provided in x,y,z.

EDF/BDF necessarily have channel names in the file (which I think is what you mean by "headers" right?). There are also guidelines on what the format of such channel names should look like (modality-space-identifier, i.e., EEG Cz or MEG 1441 or EMG 002). I suppose it would be conceivable to have an EDF/BDF file where the channel names were non-unique (which IMO would be a degenerate case), but I don't think they can be missing.

True, sorry. But maybe it can be pointed out, that the names in the 'channels.tsv' MUST match the names in the BDE/EDF file?

JuliusWelzel avatar Mar 14 '25 08:03 JuliusWelzel

IMHO, acq-<label> is more meaningful as it would indicate separate acquisitions.

Agreed, maybe a PR can be opened to extend the definition for the acq label as @drammock suggested?

  1. acq_time in the scans_tsv is quite clear IMO. We briefly discussed accommodating a LATENCY channel, if the data has multiple recordings. Probably, we should add it to the list of reserved channel types?

Good idea, I would be in favor off adding LATENCY to the channel types. The scans.tsv file will also be replace with a recordings.tsv file in BIDS 2.0.

JuliusWelzel avatar Mar 14 '25 08:03 JuliusWelzel

ping @robertoostenveld and @tjeerdboonstra. I think we're about ready to open this up to public comment; do you want a chance to go through it again first?

drammock avatar Mar 21 '25 22:03 drammock

maybe a PR can be opened to extend the definition for the acq label as @drammock suggested?

done in #2090

drammock avatar Mar 24 '25 16:03 drammock

sys entity feels analogous (so can replace or be replaced with) to

  • #2027

idea. So if to parallel exactly, should get systems.{json,tsv}? But then I would prefer devices.{json.tsv} as better descriptive since systems could be abstract ("coordinate system" etc).

yarikoptic avatar Mar 31 '25 13:03 yarikoptic

sys entity feels analogous (so can replace or be replaced with) to

idea. So if to parallel exactly, should get systems.{json,tsv}?

Yes! As far as I understand how devices should be used, this is what we wanted to achieve with the tracksys entity for MOTION-BIDS. In the Paper we define a "tracking-system" as:

We define a tracking system as a group of channels that synchronously sample motion data from one or multiple tracked points. To be grouped as a single tracking system, channels MUST share the core parameters of sampling (namely the sampling rate and the duration) as well as hardware and software properties, resulting in the same number of samples and, if available, a single latency channel associated with the rest of the channels.

This resulted in a REQUIRED tracksys-<label> per motion.tsv file. I think it is important to specify if users MUST define the sys/acq/dev label or if this is optional. We made it required, even though, the majority of motion datasets records data using only a single device. Should BIDS 2.0 remove the tracksys label for the motion data and streamline with whatever is decided in this and similar BEPs?

But then I would prefer devices.{json.tsv} as better descriptive since systems could be abstract ("coordinate system" etc).

As for the terminology, adopting devices.{json.tsv} is preferable over systems.{json.tsv} to avoid confusion with other abstract concepts like coordinate systems. The term "devices" more accurately reflects the physical equipment used in data acquisition, leading to clearer documentation and understanding.

JuliusWelzel avatar Mar 31 '25 16:03 JuliusWelzel

The term "devices" more accurately reflects the physical equipment used in data acquisition, leading to clearer documentation and understanding.

agreed, dev / device is semantically a better entity name than acq (or sys or recording) for what we're grappling with in EMG.

drammock avatar Mar 31 '25 18:03 drammock

Thanks very much @oesteban for your thorough review, I greatly appreciate it.

  1. New Modality: .... new modality suffixes can fragment BIDS, and a clear justification for why _physio is insufficient is necessary.

Our proposal is not just for a new modality, rather a new data type as well. We followed the current specification format adopted by EEG-, iEEG-, and MEG-BIDS. Nor EEG/iEEG/MEG provide any justification why they should be their own modalitites/datatypes, neither, AFAIK, Physio provides any justification or clarification when to use _physio modality, rather than embedding the physiological data in other modalities (like EEG), as those modalities already accommodate including physiological data as separate channels within their data files. To be clear, I am not arguing against _physio.

This can be a broader conversation as to what are the thresholds of having a specific datatype and/or modality rather than an umbrella, which could end up in a new BEP.

As to why EMG should have its own modality and data type, and not fall under Physio, there have been discussions at #1371, as well as in-person meetings. Some that I remember on top of my head are:

  1. Physio does not currently have a data type (being addressed in BEP045, #1675). Searching for the currently shared data, there is an abundance of EMG data shared as standalone datasets. There is also a considerable interest to analyze EMG data as a standalone data, suggesting that EMG can benefit from its own data type and modality.
  2. EMG data is often very high-dimensional, easily going >200 channels with 2 kHz+ sampling frequency. Using spreadsheet-style formats like TSV and compressed TSV for both long and wide data could be inefficient and (for the compressed format) not transparent. These constraints are less likely for other physiological data.
  3. EMG can be collected from any site, and can target one or multiple muscles. The current Physio spec as well as the proposed changes in BEP20 do not address electrode/sensor placement nor mapping signals to different muscles. These two features are mostly unique to EMG, and other physiological recording may not benefit from specific terms and standards used for EMG sensor placement and mapping.
  4. Current and emerging EMG research directly derives/estimates neural discharges from EMG signals. This might be a distinguishing factor compared to BEP045, which aims to address "non-neuronal physiological" data.
  1. Multiple Formats & Format Policy ... Additionally, general policy recommendations on adding new formats should be discussed separately and not within modality-specific BEPs.

EDF/+ is widely used and adopted data standard for physiological recordings. It also includes some necessary metadata such as channel names, sampling frequency, signal range, recording date, etc. The specifications as well as converters are open (see the discussion above for more details). BDF/+ is a simple extension of EDF in which the only change is that the data is being recorded in 24-bit resolution, rather than EDF's 16-bit resolution. We are not providing any general policy recommendations. It is all within EMG-BIDS. Probably the language should be more specific.

  1. Electrode Placement Pictures (_photo.jpg):

Agreed. Photos are an efficient way to convey to a human reader how the system is set up and placed. However, it poses potential ethical risks and may not be as precise, accurate, and machine readable as sensor placement description in electrodes.tsv.

neuromechanist avatar Apr 28 '25 17:04 neuromechanist

EEG/iEEG/MEG provide any justification why they should be their own modalitites/datatypes, neither, AFAIK, Physio provides any justification or clarification when to use _physio modality, rather than embedding the physiological data in other modalities (like EEG), as those modalities already accommodate including physiological data as separate channels within their data files. To be clear, I am not arguing against _physio.

As I mentioned above, I don't advocate for having explicit explanations within the specs. However, the policy about what can derive a new datatype should be agreed upon before BEPs start sprawling the datatype level. While I certainly do not disqualify EMG as a neural signal, I think (i)EEG and MEG are brain signals, while EMG is generally not. To me, it makes sense those brain measurements have their own datatype directories and all other neuroscience-relevant data go within those directories (or beh/). Will elaborate more on this later.

This can be a broader conversation as to what are the thresholds of having a specific datatype and/or modality rather than an umbrella, which could end up in a new BEP

Exactly. I'm just arguing that we can't advance on this BEP (and any other BEP proposing new datatype folders) until we have had this conversation. Conversely, our approach in BEP020 does not require this conversation because it works on the foundation of _physio, which is already in the spec.

As to why EMG should have its own modality and data type, and not fall under Physio, there have been discussions at #1371, as well as in-person meetings.

Likewise, under the umbrella of BEP020, we had the very same conversation, but the outcome was different because the people involved in the conversation were different. Since the same conversation is being had in different contexts in parallel, this signifies a point where BIDS requires a general policy to be defined so that BEPs do not diverge and are consistent.

  1. Physio does not currently have a data type

Agreed. Two comments on this:

  • If the problem is that "Physio does not currently have a data type" this BEP does not resolve the problem, at most it only solves it for EMG.
  • In the context of BEP020, we agreed that eye-tracking only datasets (which do exist) would write the eye tracking recordings under the beh/ (behavioral) data type. I am not aware whether that would suffice for EMG and other physiological signals, but if not, the argument goes to the previous point.

Resolving the problem specifically for EMG (or for eye tracking, or for other non-brain recordings) perpetuates the issue as you first stated it and increases the fragmentation of the general spec.

  1. EMG data is often very high-dimensional, easily going >200 channels with 2 kHz+ sampling frequency. Using spreadsheet-style formats like TSV and compressed TSV for both long and wide data could be inefficient and (for the compressed format) not transparent. These constraints are less likely for other physiological data.

Eye-tracking is also high-dimensional and dense. TSV is definitely not the solution (current specs disallow it for _physio, btw), but TSVGZ does fit the bill. The argument that compression is not transparent, when contrasted with binary formats such as EDF/BDF does not hold for me. If we are going to use a binary format, then I'd advocate for something like Parquet (don't know much about it, but totally trust @effigies that it is a really good option).

This does not mention something that BEP020 does solve - when devices generate more than just data recordings (e.g., when they generate signals and status messages, etc.). EDF and BDF address this with the + version, which mixes up data and metadata together (something BIDS definitely would like to avoid). Instead, BEP020's _physioevents.[tsv|tsv.gz] files resolve this problem (for all physio recordings).

3. EMG can be collected from any site, and can target one or multiple muscles.

I did not criticize this part of the proposal and I think it is extremely valuable. My point is that all these specific metadata can be encoded nicely (and implemented in the BIDS Validator) following the approach of BEP020 and without discontinuing _physio.

4. Current and emerging EMG research directly derives/estimates neural discharges from EMG signals. This might be a distinguishing factor compared to BEP045, which aims to address "non-neuronal physiological" data.

While these signals are neural, it doesn't seem to me EMG records brain signals. This is why I see it best suited within _physio. Please note, this thinking SHOULD NOT undermine the representation of EMG data. I am positive that following BEP020's approach, all that this BEP proposal prescribes can be equally achieved.

We are not providing any general policy recommendations.

Yes, it is scoped within EMG, but there is language stating what formats could be added and what are the requirements. IMHO that language does not fit this (nor any other) BEP (with the exception of a specific BEP to establish these policies across the spec).

EDF/+ is widely used and adopted data standard for physiological recordings.

Sure, I'm not attacking the format---if you all experts decided in favor of them after such a comprehensive conversation as the one above, I'm absolutely convinced that the four EDF/BDF/+ are excellent formats. Please refer to my point on upstream-looking vs. downstream-looking formats above (https://github.com/bids-standard/bids-specification/pull/1998#discussion_r2063927930). Please also note the above comment regarding metadata (intertwined within a single file in the case of the "plus" versions of EDF and BDF).

However, it poses potential ethical risks and may not be as precise, accurate, and machine readable as sensor placement description in electrodes.tsv.

Like above---from my ignorance, the proposal of electrodes.tsv seems necessary and critical for EMG data representation so I really trust BIDS is better with such a concept for EMG. That said, I still think those metadata would be more consistently implemented with the approach of BEP020.

oesteban avatar Apr 28 '25 21:04 oesteban

Hi all, contributing my thoughts to the file format discussion for BIDS-EMG.

While my primary research hasn't been solely focused on EMG, I bring experience working with data analysis across several related modalities (EEG, ECG, EOG, MEG, fMRI, and currently working in an fNIRS/ExG company). This gives me a fairly broad view of common practices and data handling needs in these domains, also from the perspective of other users, including those who are not tech-savvy as most of us are.

A core principle of BIDS is enhancing data sharing for the purpose of reuse and analysis. Therefore, the practical usability of the chosen format within the target community seems crucial. How easily can researchers integrate BIDS-compliant EMG data into their existing analysis workflows?

This brings me to the suggestion of compressed .tsv. From my perspective, this format doesn't seem to have established traction within the EMG research community or widespread support in commonly used analysis tools. In previous BIDS extensions (like BIDS-EEG), the selected formats (e.g., EDF, BrainVision) were largely chosen based on existing community adoption, open specifications, and tool support – prioritizing practicality.

Introducing a less common format like compressed .tsv would necessitate an extra data conversion step for many users. This requires developing and maintaining specific conversion tools, which can be a barrier, especially for researchers who aren't primarily software developers and rely on established toolboxes.

Conversely, focusing on file formats already prevalent in the EMG community, particularly those that are open and supported by major software packages (like FieldTrip, MNE, etc.), appears more aligned with BIDS' goal of reducing friction in data sharing and analysis.

While I appreciate the need to consider future-proof formats (and I have argued for adding the BV format elsewhere), the primary standard should arguably reflect what the community currently uses effectively.

Therefore, I think we should priorite the currently proposed data formats with demonstrable, widespread use and robust tool support within the EMG field to maximize the immediate utility and adoption of BIDS-EMG.

Horschig avatar Apr 29 '25 06:04 Horschig

@oesteban, I moved the conversation regarding data type and modality to #2108, with a summary of what were discussed here. Please consider expanding the discussion there. I believe that this discussion is very important (and overdue), and deserves independent attention. I hope that the discussion results in a clear guidelines and policy that helps us toward transparent, and unambiguous data sharing 😊.

neuromechanist avatar Apr 29 '25 15:04 neuromechanist

I moved the conversation regarding data type and modality to #2108,

Thanks! I'll make sure to bring this thread to the upcoming BIDS maintainers meeting in the context of BEP020 and this one :)

Let's continue that conversation there.

oesteban avatar Apr 29 '25 18:04 oesteban

Thanks all for the lively discussion. I'm going to try to summarize what I see as the points of contention, in hopes that it will move the discussion forward.

One point regarding photos seems to amount to "what you're doing here is fine, but we should have a broader discussion about photos too", so I won't comment further here. The other two points of contention:

  1. Should EMG be a separate datatype? or an emg modality under another datatype? or should it fall under physio modality?

    • current state of this PR proposes emg modality within emg datatype
    • if it's emg modality, presumably it goes under beh datatype (or physio, pending BEP045 landing first)
    • Discussion of general criteria for adding new datatypes/modalities has moved to #2108; was supposed to be discussed at 01 May maintainer meeting, but the notes suggest that didn't happen

    My opinion: we propose a new modality and datatype primarily because of the many parallels between EEG and EMG. The main differences are (1) is the electrode on the scalp or somewhere else on the body, (2) how do you describe sensor placement information, and (3) how to handle the electrode/channel distinction for "integrated bipolar" EMG devices. To my mind, point (1) is immaterial; I don't see the value in restricting "first-class citizens" (AKA datatypes) to only those measurements that target the brain. Points (2) and (3) also don't strike me as disqualifying for making EMG its own datatype; rather, they are nuances that need to be pondered and addressed, but are ultimately addressable within existing data structures (coordsystem.json, channels.tsv, etc).

  2. What filetype(s) are appropriate for EMG?

    • current state of this PR proposes EDF(+) and BDF(+)
    • if EMG data goes under physio modality (see above), then the choice is already made (tsv.gz)
    • Parquet has also been suggested as a possibility
    • There are questions about "extensibility" and whether the BEP should state criteria for future addition of other file formats

    My opinion: I am in favor of prioritizing the data formats that are already in use by the EMG community, and/or ones that are easy for them to adopt. Secondarily, I have a bias towards file formats that are already well-supported by the existing software tools. I also admit I have a bias against tsv.gz because it feels far too error-prone to store column names and data values in separate files.

    To me, these considerations point to EDF/BDF(+) (because according to @neuromechanist, many EMG device manufacturers already support EDF export), and also to the BrainVision data format (because according to @robertoostenveld there are existing EMG datasets in that format that were recorded on EEG equipment, and according to @Horschig some manufacturers soon will start using it for new data).

    I agree that Parquet is a "nice" data format, well-suited to dataframe-like structures, and easy for downstream data-science-type consumers to ingest. But Parquet is not currently supported in MNE-Python / MNE-BIDS, and the main Python tool for interacting with Parquet files carries a dependency on Pandas, which in MNE-Python we've been trying hard to avoid adding as a dependency for quite some time. This makes the path forward with Parquet a bit unclear, at least for the MNE maintainers. In contrast, we already support EDF and BrainVision formats, and BDF support is not very hard to add and will certainly be added if this BEP ends up including BDF as an allowed format.

    Regarding "extensibility", I think it's being used in two distinct senses in this discussion, that should be separated. One is in the discussion of DICOM, which IIUC allows extra arbitrary metadata fields in its header. EDF+/BDF+ are not like that. The other sense of "extensibility" refers to the BIDS standard itself, and I am quite content to remove the text regarding possible future supported file types and criteria for adopting them.

    One final point about file types regards "mix[ing] up data and metadata together (something BIDS definitely would like to avoid)." While it's true that EDF/BDF+ formats can contain metadata, the same is true of BrainVision (for EEG) and FIF (for MEG). But this doesn't prevent their separation! MNE-BIDS, for example, when writing out BIDS-compliant datasets, will record which channels are marked as "bad" in channels.tsv rather than marking those channels as bad in the FIF data structure itself. Similarly, on reading a BIDS-formatted dataset, it will use channels.tsv (not the FIF metadata) to determine which channels are "bad". A similar point holds for "annotations" in the file metadata and events.tsv. So in sum, I don't think the file format's ability to hold metadata should count as a strike against it; as long as the tools are doing their job when creating the BIDS dataset, the desired data/metadata separation can be achieved.

drammock avatar May 01 '25 20:05 drammock

Hi all,

We (a Team of researchers from Imperial College London, Umea University, and University of Stuttgart) ran into this BIDS extension proposal while preparing a dataset collection for an EMG-processing benchmark problem and searching for a standardized data format to share that data. We would be highly interested in using BIDS for that purpose.

Using the documentation and the existing data, we have started to convert some datasets into EMG-BIDS format (for example, see https://doi.org/10.7910/DVN/F9GWIW). Our impression from that journey was mostly positive, and we want to share the few issues we struggled with:

  • EMG is often acquired together with additional biomechanical data (e.g., joint torque, joint torque velocity, requested trajectories, ...). Currently, we are treating such time series data as MISC channels and specifying the type of the recorded signal in the description fields. It might be worth thinking whether such channels are frequent enough to associate them with their own channel type keyword.
  • If such bio-mechanical data is generated (e.g., from dynamometer devices), should the hardware description be added to the EMG-sidecar file or should it go into a separate file?
  • In the documentation, task descriptions are also part of the EMG-sidecar file. EMG is typically used in tasks related to motion, and task descriptions can be kind of extensive (even simple isometric tasks can require specification of several joint ankles). This makes the EMG-sidecar files (i) potentially crowded and (ii) they could contain a lot of redundant information that is shared across all subjects. Is there a better place for detailed task descriptions (e.g., in the dataset sidecar file) that would allow us to keep the EMG sidecar file more compact and focus on the truly EMG-specific properties?
  • Besides surface EMG, there is also the option of using invasive EMG (there are needle electrodes, fine wire electrodes, thin film electrodes). Should one use the "Description" field to separate these two modalities? Most metadata should be similar between the two modalities, however, it is really hard (maybe even currently not really possible) to keep track of the electrode positions after the insertion. So the coordinate.tsv file would be hard to use in that case.

klotz-t avatar Jun 06 '25 08:06 klotz-t

  • EMG is often acquired together with additional biomechanical data (e.g., joint torque, joint torque velocity, requested trajectories, ...).

Hi @klotz-t,

sounds like a great project. For motion data there is a seperate specification and paper. This data would go into a seperate folder. If you record motion data which is not yet defined, you can extend the specification to your need with the MISC channel type as mentioned.

JuliusWelzel avatar Jun 06 '25 09:06 JuliusWelzel

@klotz-t , thanks very much for sharing your thoughts. I am glad that even the draft specifications are somewhat helpful 😊.

It might be worth thinking whether such channels are frequent enough to associate them with their own channel type keyword.

Great problem, some things to consider: Ideally, the aim is to share data as "raw" as possible. If the force, requested trajectory, etc., are being recorded by the same instrument as the EMG (such as experiments using OTB), then appending EMG with these channels is excellent. If not, I'd prefer having them as a physio data (see the spec, I am not sure if it accepts formats other than tsv.gz though). Nevertheless, I think adding FORCE (and possibly TORQUE) channel type would be useful, as several EMG instrument manufacturers have integrated force sensors and inputs in their acquisition systems.

If such bio-mechanical data is generated (e.g., from dynamometer devices), should the hardware description be added to the EMG-sidecar file or should it go into a separate file?

Then, it will conveniently go to physio.json

EMG is typically used in tasks related to motion, and task descriptions can be kind of extensive (even simple isometric tasks can require specification of several joint angles). This makes the EMG-sidecar files (i) potentially crowded and (ii) they could contain a lot of redundant information that is shared across all subjects.

(i) Assuming motion using motion data type (= directory) and modality, EMG using emg data type and modality and Force using physio modality (but still inside emg data type), one TASK will have three JSON files, each describing one source of recording. The holistic view can be pulled in post. (ii) BIDS has an Inheritance Principle, so if the same setup is used for all subjects, you just need to put the JSON files in the root (e.g. task-isometric_<modality>.json), and it will be inherited across the dataset.

Besides surface EMG, there is also the option of using invasive EMG (there are needle electrodes, fine wire electrodes, and thin film electrodes).

Very interesting, although not common. Since we did not have a representation for iEMG, and the use is not very common, we decided not to address potential specific requirements that iEMG may need. Again, I'd imagine if iEMG is recorded on the same instrument, it will go under the MISC channels, and you need to describe the details in the channels.json. This will also allow you to use electrodes.tsv to indicate the electrode-specific specifications and location. If they are separate instruments, you might want to use physio.

Thanks again for this interesting use-case example.

neuromechanist avatar Jun 06 '25 16:06 neuromechanist

@neuromechanist, thanks for your detailed response.

Maybe one remark regarding the use of invasive EMG. It is true that this modality is little used in research settings. Yet, clinical EMG is almost exclusively acquired using concentric needle electrodes (whereby so-called spontaneous activity serves as a diagnostic indicator for various neuromuscular disorders). If such data is within the scope of BIDS, it might be worth considering if it already fits into the proposed framework / what modifications would be needed.

klotz-t avatar Jun 16 '25 12:06 klotz-t

@klotz-t,

If such data is within the scope of BIDS, it might be worth considering if it already fits into the proposed framework / what modifications would be needed.

I am in favor of adding that, if there is a community (that is, a person, hopefully yourself) familiar with the iEMG research. To make the case, I suggest the following:

  1. List some examples of the data shared with iEMG. I know of a couple, including this recent concurrent EMG/iEMG from Bradford et. al..
  2. Try to make an example dataset. Look at https://github.com/bids-standard/bids-examples/pull/480, you are welcome to make a PR to my repo, and I will merge. (I am in favor of having concurrent EMG datasets, since it is one of the unique features of EMG data as compared to EEG/MEG 😉)
  3. Identify what metadata needs to be added to the sidecars to accommodate the data.

I think we are at a stage that we still can accommodate this inclusion. But, it still depends on someone that has the familiarity with the research and required metadata.

neuromechanist avatar Jul 09 '25 19:07 neuromechanist

@neuromechanist

I have some experience in invasive EMG, but for this purpose, I had a chat with a few people who are dealing more frequently with intramuscular EMG. Generally, it seems like intramuscular EMG data can be handled with minor adjustments within the current proposal. Here are some initial thoughts:

  • Should there be a unique keyword (e.g., "IEMG") to distinguish intramuscular and surface recordings (keyword "EMG")
  • Data format -> 16-bit should be sufficient, a lot of data is stored in .wav format (doctors listen to EMG), and it might be worth considering if that is a permissible file format
  • ElectrodeType should be a required field, and the most important electrode types are (i) concentric needles, (ii) monopolar needles, and (iii) fine wires
  • The surface area of the electrode (e.g., ElectrodeSurfaceArea) does not seem to be a reported field in the sEMG specification and should be a required field for invasive EMG
  • ElectrodeMaterial should be a required field for invasive EMG, and next to the electrode material itself,f one should also report the material of the insulator (e.g., in a concentric needle, there is a sensing electrode at the tip of a needle and the cone of the needle serving as reference electrode. They are separated by an insulator)
  • One should report the size of the needle, i.e., diameter x length or gauge (common unit used for canula sizes, e.g., reported as 25G)
  • Exact electrode positions are challenging to report. Yet, one should roughly report the insertion position (e.g. X% % between two anatomical reference points), the target muscle (if available, adding information like distal, proximal, ...), the depth of the inserted electrode (recommended/optional), and the insertion angle (recommended/optional)
  • In invasive EMG recordings, it is also common that the needle is moved within the same session. The best practice would be to make a separate recording per position. Otherwise, needle motion needs to be annotated in an events file.

I can try to make two examples.

  1. Traditional clinical recording with a single-channel concentric needle
  2. More advanced research setting using a multi-channel configuration

klotz-t avatar Jul 16 '25 08:07 klotz-t

@klotz-t, sorry for missing this. Most of the metadata you pointed out makes sense to me. However, we might want to consider differentiating the MUST and RECOMMENDED for some of the fields. Given the scarcity of the iEMG datasets, I am more in favor of having emg as modality, and using a type key in the emg.json to distinguish between sEMG and iEMG. Even if there are concurrent sEMG and iEMG recordings, we can distinguish them with the recording entity. Please kindly go ahead with the examples you mentioned and add them to https://github.com/bids-standard/bids-examples/pull/480 with the metadata that you see fit, so we can discuss them sooner than later.

Thanks

neuromechanist avatar Aug 05 '25 23:08 neuromechanist

Codecov Report

:white_check_mark: All modified and coverable lines are covered by tests. :white_check_mark: Project coverage is 82.83%. Comparing base (373da35) to head (b56863a).

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #1998   +/-   ##
=======================================
  Coverage   82.83%   82.83%           
=======================================
  Files          20       20           
  Lines        1672     1672           
=======================================
  Hits         1385     1385           
  Misses        287      287           

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • :package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

codecov[bot] avatar Sep 04 '25 17:09 codecov[bot]