bids-specification
bids-specification copied to clipboard
[ENH] BEP 020 Eye Tracking
Here is the specifications of the BEP 020 about eye tracking.
- it follows the main discussion initiated on a google document.
- it includes the different modification the group of maintainers suggested to us during our zoom meeting.
- it includes the macros as used in other modality specific extensions
- it includes links toward dataset examples.
[!Note]
We meet regularly and everyone is welcome : Next meeting April 3rd 2025 4pm UTC (EST 11am, PST 8am, CET 5pm, GMT 4pm) on zoom. Note that if you consider joining but this time or day doesn't suits you, reach me (@mszinte) and I will arrange another appointment.
Chat and discussions also happening on matrix
We are currently drafing a companion paper for this BEP, feel free to participate (GoogleDoc)
[!Tip]
Issues for:
- general todo list: https://github.com/mszinte/bids-specification/issues/1
- conversion discussion: https://github.com/mszinte/bids-specification/issues/4
- list datasets: https://github.com/mszinte/bids-specification/issues/3
- [x] implement macros
- [x] for filename templates ?
- [x] for examples ?
- [x] for metadata table
- [x] add contributors to the wiki (so they can be added to the contributors page)
- [x] end docmentation
- [x] update examples
- [x] update validator
- [x] update list of contributors via the github wiki
(NOTE: I'll cross-post this message across several BEP threads)
Hi there, just a quick notification that we have just merged https://github.com/bids-standard/bids-specification/pull/918 and it may be interesting to look at the implications for this BEP.
We are introducing "BIDS URIs", which unify the way we refer to and point to files in BIDS datasets (as opposed to "dataset-relative" or "subject-relative" or "file-relative" links).
If the diff and discussion in the PR is unclear, you can also read the rendered version: https://bids-specification.readthedocs.io/en/latest/02-common-principles.html#bids-uri
Perhaps there are things in the BEP that need adjusting now, but perhaps also not -- in any case it's good to be aware of this new feature!
Let me know if there are any questions, comments, or concerns.
@effigies @tsalo Schema question For this BEP it feels like we are adding a new modality but that has no datatype counterpart at least not in the sense that it will have its own folder.
Should I therefore add an eyetrack modality in src/schema/objects/modalities.yaml
and then update src/schema/rules/modalities.yaml with something like
eyetrack:
datatypes:
- beh
- eeg
- func
- ieeg
- meg
- nirs
- perf
- pet
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 87.93%. Comparing base (
eecc617) to head (e209833).
:exclamation: Current head e209833 differs from pull request most recent head eccb785. Consider uploading reports for the commit eccb785 to get more accurate results
Additional details and impacted files
@@ Coverage Diff @@
## master #1128 +/- ##
=======================================
Coverage 87.93% 87.93%
=======================================
Files 16 16
Lines 1351 1351
=======================================
Hits 1188 1188
Misses 163 163
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Comment not directly related to the content, but just a reminder to update the google doc at https://bids.neuroimaging.io/bep020 to point to this PR, so that folks know to come here for the updated discussion.
Comment not directly related to the content, but just a reminder to update the google doc at https://bids.neuroimaging.io/bep020 to point to this PR, so that folks know to come here for the updated discussion.
I just did it.
Search for EDF in the repo results on hits here , but probably that conversation was resolved. @mszinte would you be so kind to repeat in the main thread comment on why you think .edf (which is "native" to some eye trackers) shouldn't be supported and everyone should harmonize into simple .tsv.gz?
Search for EDF in the repo results on hits here , but probably that conversation was resolved. @mszinte would you be so kind to repeat in the main thread comment on why you think .edf (which is "native" to some eye trackers) shouldn't be supported and everyone should harmonize into simple .tsv.gz?
Of course, so .edf in the context of eyetracking refers (I always believed) to "Eyelink Data File". I believe, it is not a format related to this discussion.
THis .edf format is a proprietary file format developed by SR Research for their EyeLink (one of the most used eyetracker).
One can read the .edf file only with a SR research software (need a dongle) or it should be converted using their conversion software call edf2asc which will convert the files to ASCII.
The ascii is not a simple file, it contains a lot of information (metadata), and preprocessed data (saccade, fixtation, blinks, triggers). It is not standardize as other types of eyetracker give you data in other format and structure.
One goal of the BEP020 is thus to propose a format for the raw gaze position and pupil size data that can be easily put in a TSV file. However, as eyetracker often run above 1kHz, TSV files are generally large. This is why we propose to GZIP them.
oh, I can't believe I fell into the trap believing that a "respectfully old" manufacturer was natively using an open data format! DICOMs, and working with OpenEphys aiming to output NWB really broke my "moral compass". Sure thing Eylink's EDF has nothing to do with the EDF (European Data Format) I had in mind. Thank you for the explanation @mszinte !
Dear BEP 020 devs,
I hope you are doing well. My name is Yahya; I am a postdoc at UCSD, SCCN. I have mostly worked with EEG, EMG, and motion capture on human behavior and locomotion, so I apologize if my questions here are too obvious. We are starting to implement massive eye-tracking data into one of the BIDS-EEG datasets, and I am tasked to lead the effort. I read through the superb work you did, and I have a couple of questions; I would appreciate it if you could answer them:
- I believe that the calibration of the eye tracker is an important step of the trial. As one of the steps prior to the experiment or each trial, it seems that calibration is necessary. I found calibration information defined in the metadata, but I am not sure which ones are required.
- If I understand correctly, the current proposal seeks to format the data in the compressed tabular format
tsv.gz, but the eye-tracking data frequency can go as high as 2kHz. I thought that BIDS wanted to implement a distinction between continuous data and less frequent tabular information such as events. I can understand usingtsv.gzfor respiratory rate, but eye tracking technically has a greater sampling frequency than most EEG experiments. I wonder why continuous data can't be in the European Data Format (edf), and why two separate pipelines should import data streams in the same frequency order. The other aspect is the computational overhead of the tabular data over edf, but let's not go over that. - I did not find specifications for providing derivative files, i.e., saccades, in the specification. I should have missed it, but I believe that the information, and parameters involved in deriving them, are important for analyzing other modalities, such as EEG, that would perform event-locking on the saccades. The dataset that we are working on provides both the raw data and the saccades. I wonder if we should enforce/recommend keeping specific information in the root sidecar, have the events separately in the derivative folder, or include it as a separate file with the raw data. I can see each solution work here, and I appreciate knowing your opinion.
Thanks a lot for reading through my question, and also answering them 😊.
Thanks @neuromechanist for chiming in.
- I believe that the calibration of the eye tracker is an important step of the trial. As one of the steps prior to the experiment or each trial, it seems that calibration is necessary. I found calibration information defined in the metadata, but I am not sure which ones are required.
Most of it is in the recommended section.
See the rendered spec here: https://bids-specification--1128.org.readthedocs.build/en/1128/modality-specific-files/eye-tracking.html#sidecar-json-document-_eyetrackjson
See the rendered spec here: https://bids-specification--1128.org.readthedocs.build/en/1128/modality-specific-files/eye-tracking.html#sidecar-json-document-_eyetrackjson
Thanks a lot, @Remi-Gau, for your prompt reply.
I see 🙈; I saw the insertion point on the markdown but did not realize they are rendered on the readthedocs.
So, a follow-up on 1: before each trial, the calibration might be performed or skipped. How do you think I should mark that? The original data has a flag for it.
Should it be with either having or missing CalibrationType? Should it be an additional field?
2. I wonder why continuous data can't be in the European Data Format (edf), and why two separate pipelines should import data streams in the same frequency order.
For this BEP at least we recently went from no standardized format for the datafile to at least having a dataformat. I don't think we are opposed also supporting other data formats provided that we can give dataset curators a converter to convert files into that format.
3. have the events separately in the derivative folder
What I tend to preach is that BIDS teaches you to "modularize" your data, so keep your raw data "clean" and put all the saccades info in the derivatives folder.
I see see_no_evil; I saw the insertion point on the
markdownbut did not realize they are rendered on the readthedocs.
A lot of people miss it: should probably add this info to our contributors doc.
So, a follow-up on 1: before each trial, the calibration might be performed or skipped. How do you think I should mark that? The original data has a flag for it. Should it be with either having or missing
CalibrationType? Should it be an additional field?
Good question also because it matters for dataset users.
If I came to a dataset where run 1 and 3 have some calibration info but run 2 and 4 do not, I would assume that I should be using the calibration of run 1 for run 2 and of run 3 for run 4.
But this is implicit and may be the wrong assumption. We should probably add something that explicitly mentions this in the BEP. Does that make sense?
Thanks again @Remi-Gau for your insights.
But this is implicit and may be the wrong assumption. We should probably add something that explicitly mentions this in the BEP. Does that make sense?
Absolutely, I am a proponent of the clearest and most explicit way of presenting the calibration data. An explicit flag or field would go a long way for that clarity.
This can also become a precedent for other modalities, such as motion capture and force platforms, that require calibration. It is always best to pay extra attention to the calibration information. Also, might I suggest requiring the user to provide the bare-minimum calibration information?
For this BEP at least we recently went from no standardized format for the datafile to at least having a dataformat. I don't think we are opposed also supporting other data formats provided that we can give dataset curators a converter to convert files into that format.
Yesss, that is a great step indeed. Other motion-related modalities have similar problems as well. Do you think it is possible to have European Data Format (.edf) also be supported in this BEP? As of right now, if this BEP goes into effect, I think BIDS validators would not accept .edf for eye tracking and require tsv.gz for that.
Conversion tool are great, but when you multiply them by hundreds or thousands of subjects, then the slightest overhead will become a major one. My current project is in the order of a couple of thousand datasets.
The .edf data readers are optimized for large datasets, while the tabular data readers have a great variety. So, if the BEP can support .edf, I think it would reduce the overall data curation overhead quite significantly.
Also, might I suggest requiring the user to provide the bare-minimum calibration information?
Could you be more specific? As in what is not in the BEP that you think would be required?
As of right now, if this BEP goes into effect, I think BIDS validators would not accept
.edffor eye tracking and requiretsv.gzfor that. Conversion tool are great, but when you multiply them by hundreds or thousands of subjects, then the slightest overhead will become a major one. My current project is in the order of a couple of thousand datasets.
I meant creating converters for bidsification: that can help data curators to convert the data that comes out of eyetrackers 'natively' into all the possible formats that BIDS will support for eyetracking.
FYI: we are talking on this matrix channel about creating example datasets and working on providing converters to help users with bidsification.
https://matrix.to/#/#bids-bep020:matrix.org
Feel free to join in because I think that your input could be very useful (also to minimize a lot of back and forth "noise" in this PR for future readers.) :-)
Could you be more specific? What is not in the BEP that you think would be required?
I wonder, with the calibration being recommended, how many datasets will include them, and if it is not included, how much that dataset can be trusted.
Thanks for your invitation. I will move the discussion there.
Could you be more specific? What is not in the BEP that you think would be required?
I wonder, with the calibration being recommended, how many datasets will include them, and if it is not included, how much that dataset can be trusted.
This is an important point, one factor hampering interpretability of a lot of published eye-tracking research is lack of empirically-derived data quality, importantly accuracy of calibration as determined through a validation procedure. See item 6 in this table and also the Calibration and Accuracy section in the A review of empirical eye-tracking studies as the basis for a reporting guideline section of this paper for more background.
@dcnieho @neuromechanist @Remi-Gau
Concerning the calibration discussion.
We did not put the calibration metadata as required for different reasons: Indeed, an experimenter can decide (and it is often the case) not to run the calibration on every run or session or even at all. Making this point "required" will make these datasets not valid but they might well be.
For example while in head free (or on chin rest) and multi-user setup a calibration is almost "required", single-user (animal model with head-fix and always same eyetracker position), or head fixed (e.g. head cast of MRI) setup do not necessarily need a calibration. Also for example when you scan someone and use eyetracking as a secondary measure, you may want to save time and skip calibration on every run.
Next some other people may decide not to calibrate because of some calibration errors (often calibration may fail on some point and not other) or because they decide not to. Still they may use the data without the frame of reference. The data quality will suffer but one may still extract some info.
Then, we add a "CalibrationRun" boolean flag that will mention the presence or absence of calibration in the run, but somehow one (a program) can assume that it is absent if "CalibrationType" and "CalibrationPosition" is missing, don't you think ?
I did not find specifications for providing derivative files, i.e., saccades, in the specification. I should have missed it, but I believe that the information, and parameters involved in deriving them, are important for analyzing other modalities, such as EEG, that would perform event-locking on the saccades. The dataset that we are working on provides both the raw data and the saccades. I wonder if we should enforce/recommend keeping specific information in the root sidecar, have the events separately in the derivative folder, or include it as a separate file with the raw data. I can see each solution work here, and I appreciate knowing your opinion.
The goal of the BEP020 (and first mission of BIDS) is to deal with the "raw" data, not the derivatives. We recommend people having preprocessed derivatives in their original datasets (for example eylink .edf file contain saccade data) to put them in the /sourcedata folder.
This said, I know the community of user will be unhappy of "loosing" (it will be in sourcedata) the preprocessed data provided by the eyetracker, but the next "goal" (once BEP is accepted) is to setup a community based BIDS-APP. Such app should receive in of BIDS valid dataset and provide eye-tracking data quality and preprocessed data as other apps for imaging does. We would potentially gain the opensource format, the reproducibility and transparency of the method (eyetracker company do not necessarily share their saccade extraction code for example) and potentially the ability to put together researchers who have developed great analyses tools on their side but not yet share them.
https://matrix.to/#/#bids-bep020:matrix.org
Feel free to join in because I think that your input could be very useful (also to minimize a lot of back and forth "noise" in this PR for future readers.) :-)
Concerning the European Data Format rather that TSV.GZ format, I suggest to move this more complex discussion to Matrix. I will put the conclusion of our discussion here afterward.
There is a mention of of to use physio files to store eyetracking data in the phyios page that this BEP should probably remove:
https://github.com/bids-standard/bids-specification/blob/cba4ebcaa039cd10d6dcec46b53110c4024f924d/src/modality-specific-files/physiological-and-other-continuous-recordings.md?plain=1#L44
I deleted all mention of eyetracking there.
Sorry I closed the PR by mistake....
Could you be more specific? What is not in the BEP that you think would be required?
I wonder, with the calibration being recommended, how many datasets will include them, and if it is not included, how much that dataset can be trusted.
This is an important point, one factor hampering interpretability of a lot of published eye-tracking research is lack of empirically-derived data quality, importantly accuracy of calibration as determined through a validation procedure. See item 6 in this table and also the Calibration and Accuracy section in the A review of empirical eye-tracking studies as the basis for a reporting guideline section of this paper for more background.
Thanks, we missed this reference, I will read it carefully. I see you are listed as author, it would be nice to get feedback from you or your coauthors for this BEP.
@dcnieho @neuromechanist @Remi-Gau
Concerning the calibration discussion.
We did not put the calibration metadata as required for different reasons: Indeed, an experimenter can decide (and it is often the case) not to run the calibration on every run or session or even at all. Making this point "required" will make these datasets not valid but they might well be.
For example while in head free (or on chin rest) and multi-user setup a calibration is almost "required", single-user (animal model with head-fix and always same eyetracker position), or head fixed (e.g. head cast of MRI) setup do not necessarily need a calibration. Also for example when you scan someone and use eyetracking as a secondary measure, you may want to save time and skip calibration on every run.
Next some other people may decide not to calibrate because of some calibration errors (often calibration may fail on some point and not other) or because they decide not to. Still they may use the data without the frame of reference. The data quality will suffer but one may still extract some info.
Then, we add a "CalibrationRun" boolean flag that will mention the presence or absence of calibration in the run, but somehow one (a program) can assume that it is absent if "CalibrationType" and "CalibrationPosition" is missing, don't you think ?
That is fair enough, and i realize your required is different from the sense in which i use it. I don't really know how these formats work, but just to make sure users think about this carefully, can a statement about calibration (performed/not performed) by made required (at least at the session level) and then if performed a statement about whether it was validated or not be required and if it was validated then validation values should be required to be included? I have no idea if such dependent required things are a good idea or even possible, and trust you do as you see fit. So no need for a lengthy reply. I just want to make sure users don't accidentally lose these values when converting their files and have a problem later when they need it when writing the paper or even worse when a reviewer asks for them.
Thanks a lot, @mszinte, for your detailed response.
Then, we add a "CalibrationRun" boolean flag that will mention the presence or absence of calibration in the run, but somehow one (a program) can assume that it is absent if "CalibrationType" and "CalibrationPosition" is missing, don't you think ?
I can see that works out. Let's consider a multi-subject study with multiple trials per subject. I think that CalibrationType, and CalibrationPosition can be defined at the study level, so it applies to downstream based on the inheritance principle. Then, the MaximalCalibrationError would help determine the calibration "used" at any stage (e.g., subject or trial).
However, this does not determine "when" the calibration is performed. So, for multiple head-free trials, I believe having calibration before each trial is different from before some of the trials. Still, even for the trials w/o a calibration, the experimenter will use a MaximalCalibrationError that they got from the previous trial. So, I still think the CalibrationRun would make this explicit.
Re: Having the calibration required or not, I strongly agree with @dcnieho that at least a statement should be made, or making CalibrationType REQUIRED and then giving a na for non-applicable cases.
Could you be more specific? What is not in the BEP that you think would be required?
I wonder, with the calibration being recommended, how many datasets will include them, and if it is not included, how much that dataset can be trusted.
This is an important point, one factor hampering interpretability of a lot of published eye-tracking research is lack of empirically-derived data quality, importantly accuracy of calibration as determined through a validation procedure. See item 6 in this table and also the Calibration and Accuracy section in the A review of empirical eye-tracking studies as the basis for a reporting guideline section of this paper for more background.
Thanks, we missed this reference, I will read it carefully. I see you are listed as author, it would be nice to get feedback from you or your coauthors for this BEP.
I would be happy to, only had a quick look so far. Ping me if i don't do so in a few days.
maintenance note: added a link to the rendered BEP in the top message of this PR.
We’re trying to format some eye tracking data following this specification. One thing that isn’t clear is where information about eye-specific events, such as saccades, should be stored. The task events file doesn’t seem right for this, since that seems intended to provide info about external stimuli/events. Should it just go in an additional column in the eyetrack.tsv file? Or should it actually be integrated with the task events? I see there's a little discussion of this above as part of a .edf vs .tsv.gz conversation, but I didn't see any resolution. (Perhaps I just missed it.)
A related issue is that the onset column in our task events file currently refers to timestamps from our EEG data, which have different values from the timestamps in the eye tracking data. Correlating the two would most likely result in some loss of accuracy. It seems like the best approach here is probably to have something like:
sub-01/
ses-pre/
eeg/
sub-01_ses-pre_task-foo_acq-eeg_events.tsv
sub-01_ses-pre_task-foo_acq-eyetracking_events.tsv
…but I’d like to hear if anyone has any other suggestions.
Hi @noah10, The "eye-specific" events are not raw data but derivatives and thus are not part of our proposal. But, you are free to add it in the _eyetrack.tsv additional columns for the saccades as long as you document them in the corresponding json file. Another option is to keep these data in their native format in /sourcedata folder.
Concerning the discussion about ".tsv" vs ".tsv.gz", it isn't yet solved. I'm missing time these days but we will come to a decision ASAP. My opinion today is that we might just accept both formats.
About the syncing between recording modalities, I don't have a perfect solution, i believe i would only keep one _events.tsv and put in it a column referring to the corresponding eye tracking onset timestamp. Maybe @Remi-Gau or other have an opinion ?