bids-specification
bids-specification copied to clipboard
Include Persyst lay/dat file format for iEEG and EEG modalities.
Your idea
The available file formats for iEEG (ECoG and sEEG) exclude an important part of the community: clinical, high resolution, long duration EEG. Note, of the available file formats, only one (edf) is commonly available in software commonly used by clinicians. Storing data in a format in which prevents clinicians from using a standard viewing software is a major limitation, and thus edf is the only option. However, edf was not designed for long duration, high resolution recordings (e.g., 24 hours at 4,096 Hz sampling rate and 100+ channels). Several of its internal design aspects lead to inefficiencies, increasing computational cost and processing time. Many standard edf readers are also not designed for high resolution, long duration recordings. For example, matlab's edf reader is setup to read an entire file into memory at once (generally unwise for files that can be 50-100 GB). Overall, the edf file format is a poor choice for this type of data. Thus, available file formats do not meet the needs of the clinical research community involving high resolution, long duration iEEG.
It should also be noted that the file formats survey (https://bids.berkeley.edu/news/bids-megeegieeg-data-format-survey) did not include a realistic representation of iEEG researchers. For example, in the survey, the file formats for 6 different MEG vendors were listed, spanning all major MEG manufacturers. However, for intracranial EEG, file formats were only listed for 2 vendors, neuroscan and micromed. These two comprise a relatively small market share of vendors of clinical, high resolution iEEG acquisition systems. It is quite surprising that Nihon Kodan and Natus file formats were not listed, given their large market shares. The fact that the file formats of vendors is so limited suggests that the survey respondents do not accurately represent the full community of iEEG researchers. In fact, since many technical issues are different when dealing with the large files in high resolution iEEG, the survey should have distinguished file formats for low and high temporal resolution data. Thus, it appears the decision about file formats was based on incomplete information.
My proposed solution is to include the Persyst lay/dat file format in the list of supported formats. Like the currently supported brain vision format, it is an open source format maintained by a commercial entity. As it is open source and a relatively simple format, it is not a major effort to include it in current software based on the BIDS standard. I have written a C++ reader for iEEG data stored in Persyst lay/dat files, and thus I speak from experience. Also similar to the brain vision format, it includes an interleaved binary file with a text header file. This format works well for high resolution, long duration files. However, unlike the brain vision format, Persyst is widely used in the clinical EEG and iEEG communities. Persyst partners with many companies which make clinical iEEG data acquisition devices, including Natus, Compumedics/Neuroscan, Nihon Kodan, Micromed, and Cadwell. Persyst thus has a wide user base, and many clinicians use the Persyst viewing software. Adding Persyst lay/dat files as a supported format would allow an option for clinical research with high resolution, long duration iEEG--a type of research currently excluded by the BIDS standard.
Pinging @bids-standard/raw-electrophys-ieeg and @bids-standard/raw-electrophys-eeg for their thoughts.
Out of curiosity, where is the spec for the persyst format? I can find a bunch of third-party implementations, but no specs.
The choice of good candidates for EEG/iEEG formats to be included in the specs was a complicated one. There is no all-purpose and widely used file format like .nii for MRI volumes. I was not part of the decision process, but I understand this choice. Let me try to justify some of it.
However, edf was not designed for long duration, high resolution recordings (e.g., 24 hours at 4,096 Hz sampling rate and 100+ channels).
I agree, EDF+ is not an efficient format. It is limited to 16bit data and the events management is very cumbersome and limited. And unfortunately, it is the only one that is widely available as an export option from most commercial software.
However, it has some interesting technical characteristics: it provides a storage by page (n-second-blocks of multiplexed data) which has a decent balance in execution times for reading long files both by time vs. by channel (i.e. reading all the samples for one channel, vs. reading all the channels for a short time segment).
For example, matlab's edf reader is setup to read an entire file into memory at once (generally unwise for files that can be 50-100 GB).
This is a problem of implementation of the processing software, not a design issue in the file format. There are efficient readers that do not require loading the entire file at once. For example, the Brainstorm reader: in_fopen_edf.m / in_fread_edf.m
However, for intracranial EEG, file formats were only listed for 2 vendors, neuroscan and micromed. These two comprise a relatively small market share of vendors of clinical, high resolution iEEG acquisition systems. It is quite surprising that Nihon Kodan and Natus file formats were not listed, given their large market shares.
One objective in the EEG specs was to include only open file formats, that were already fully supported in all the major open-source software. Neuroscan and Micromed are two companies that are actively engaged in the collaboration with the open-source and open-science movements, they provide detailed specifications of all their file formats, and technical support to open-source developers wanting to interact with their software. At the exact opposite, Nihon Kohden and Natus are against any information sharing outside of a non-disclosure agreement. We managed to reverse engineer part of the Nihon Kohden file formats - they didn't seem to care much but they didn't help at all, and their file format changes with every new device they release. I would give the award of the most anti-research company to Natus - which at some point replied that they would send their lawyer if we attempted to read their files. Unfortunately, they bought most of the smaller EEG companies and now lock a large share of the market with this close-minded policy... Everybody who has interacted technically with these companies would understand why their are not part of this survey.
My proposed solution is to include the Persyst lay/dat file format in the list of supported formats. Like the currently supported brain vision format, it is an open source format maintained by a commercial entity.
My understanding was that the format was proprietary and not publicly documented, but that some readers existed, which does not make it a valid candidate for the BIDS specs. However, I have never tried to work with this format, so this might be incorrect. If I am wrong, please share with us a link to the specification of the file format.
Overall, I think that it is not a good idea to expand the list of supported file formats, as it would make it more complicated to develop software that is 100% compatible with the BIDS specification. Since the universal format does not exist, it will require data conversions after the acquisition: acquisition in a proprietary format, followed by a conversion to an open format.
I agree with the choice of making the BrainVision format an ideal standard: extremely simple, relatively efficient, human readable headers, open and supported by a company that actively engages interactions with the open-source/open-science community. One solution for Persyst users would be to user open-source converters to convert from Persyst to BrainVision.
An even better option would be that the Persyst software includes an export to the BrainVision format, in an effort of becoming "BIDS-compatible". As a customer of the Persyst company, you could submit this request to their customer support, and maybe ask all your collaborators to do the same.
The following is slightly off-topic and specific to Brainstorm:
For example, Brainstorm can be used to convert from a lot of formats to BrainVision: https://neuroimage.usc.edu/brainstorm/Introduction#Supported_file_formats The Persyst format was never requested, but it would be a nice addition, and probably very easy to implement if as you say there are efficient Matlab functions already reading the file (by efficient, I mean: capable of reading in a optimized way only a few time samples and/or a few channels from the binary file). If you are interested in helping implementing this: please share and example file and the link to a Matlab function you'd recommend to read it. Thanks!