spikeinterface icon indicating copy to clipboard operation
spikeinterface copied to clipboard

reading Binary (WhiteMatter) recording files

Open lihao881230 opened this issue 7 months ago • 17 comments

Hello, does anyone know how to read recordings in .bin files? We use the WhiteMatter acquisition system that records using the OpenEphys GUI.

Image

lihao881230 avatar May 31 '25 19:05 lihao881230

We are working on a white matter recording extractor. The pilot version is here cc @pauladkisson ? Any comments here for white matter?

Not sure if the openephys guy transforms the data. We also have an openephy reader too which can be obtained with from spikeinterface.extractors import read_ophenephys. If that doesn't work we also allow people to load a binary file themself using

from spikeinterface.extractors import read_binary

But in this case you need to supply dtype, n_channels, and sampling_frequency.

zm711 avatar Jun 01 '25 17:06 zm711

I do believe the white matter recording extractor should work in this case. But let me know if you run into any problems.

pauladkisson avatar Jun 02 '25 13:06 pauladkisson

Thank you for your response—the recording extractor is working without any issues.

Do you happen to have an extractor available for the analog event channels as well (also in .bin format)? Or would it be best for me to load the event data manually using read_binary and compute the relevant event times myself?

Thanks again for your help!

On Mon, Jun 2, 2025 at 8:57 AM Paul Adkisson @.***> wrote:

pauladkisson left a comment (SpikeInterface/spikeinterface#3964) https://github.com/SpikeInterface/spikeinterface/issues/3964#issuecomment-2930860925

I do believe the white matter recording extractor should work in this case. But let me know if you run into any problems.

— Reply to this email directly, view it on GitHub https://github.com/SpikeInterface/spikeinterface/issues/3964#issuecomment-2930860925, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIBMEAXZGK3E3VJYCCKGTQ33BRJ2NAVCNFSM6AAAAAB6KGMCWWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDSMZQHA3DAOJSGU . You are receiving this because you authored the thread.Message ID: @.***>

-- Thanks,

Hao

lihao881230 avatar Jun 03 '25 02:06 lihao881230

Hi, are the events also produced by the WhiteMatter? Are the signals multiplexed? Do you have access to the spec for the analog/events format?

Yes, at the moment you loading the data would be the simplest solution. I assume you want to load the data and then threshold it to extract event times, right?

h-mayorquin avatar Jun 03 '25 17:06 h-mayorquin

@lihao881230,

Could you provide us any additional info. We want to fully support WhiteMatter so a spec and test files/folder would be amazing for us.

zm711 avatar Jun 19 '25 15:06 zm711

Hi Zach and all,

Thanks again for the support. Here is a Google Drive link to a test recording: https://drive.google.com/drive/folders/10wQg39RV2Gles2L2kT08LP7EQsoArgFD?usp=sharing. The files are divided into ~5-minute segments, and there were no events recorded on either the AnalogPanel or DigitalPanel.

I’ve also included a notebook that uses the read_binary function to process the data. It seems to work correctly, but I’d greatly appreciate it if you could give it a quick review to ensure I haven’t missed anything or introduced errors in the loading and sorting pipeline.The notebook includes a section for extracting analog events from the binary files. However, since these test files don't contain any events, I’m unsure if that portion is implemented correctly. If possible, I’d really appreciate your input on the functions for extracting both analog and digital events.

Thank you so much again for your help!

On Thu, Jun 19, 2025 at 10:27 AM Zach McKenzie @.***> wrote:

zm711 left a comment (SpikeInterface/spikeinterface#3964) https://github.com/SpikeInterface/spikeinterface/issues/3964#issuecomment-2988490859

@lihao881230 https://github.com/lihao881230,

Could you provide us any additional info. We want to fully support WhiteMatter so a spec and test files/folder would be amazing for us.

— Reply to this email directly, view it on GitHub https://github.com/SpikeInterface/spikeinterface/issues/3964#issuecomment-2988490859, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIBMEATSCAPT63W4T4ZIKQ33ELJGRAVCNFSM6AAAAAB6KGMCWWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDSOBYGQ4TAOBVHE . You are receiving this because you were mentioned.Message ID: @.***>

-- Thanks,

Hao

lihao881230 avatar Jun 19 '25 16:06 lihao881230

Hi, the most important for us is to know if you have the spec.

Which will answer these questions:

Hi, are the events also produced by the WhiteMatter? Are the signals multiplexed? Do you have access to the spec for the analog/events format?

h-mayorquin avatar Jun 19 '25 16:06 h-mayorquin

Hi, sorry I missed your previous questions. The events are produced by external TTL inputs and recorded by the whitematter via their Analog panel. I believe they are recorded separately. I attached some files generated by the system, which should have the spec. But please let me know if you look for specific ones.

On Thu, Jun 19, 2025 at 11:16 AM Heberto Mayorquin @.***> wrote:

h-mayorquin left a comment (SpikeInterface/spikeinterface#3964) https://github.com/SpikeInterface/spikeinterface/issues/3964#issuecomment-2988620032

Hi, the most important for us is to know if you have the spec.

Which will answer these questions:

Hi, are the events also produced by the WhiteMatter? Are the signals multiplexed? Do you have access to the spec for the analog/events format?

— Reply to this email directly, view it on GitHub https://github.com/SpikeInterface/spikeinterface/issues/3964#issuecomment-2988620032, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIBMEAUUXNVDTPRKBBSXTXT3ELO6ZAVCNFSM6AAAAAB6KGMCWWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDSOBYGYZDAMBTGI . You are receiving this because you were mentioned.Message ID: @.***>

-- Thanks,

Hao

lihao881230 avatar Jun 19 '25 16:06 lihao881230

Hi, thanks, actually the settings.xml do have a lot of info that we did not know was available for this format (such as the gains and the binary layout of the analog and digital pannels).

I was not clear but when I was asking for the spec I was asking if you did not have a document (pdf or other format) where the format is described. This is usually provided by the manufacturer of the acquisition system.

Also, I don't see the notebook in the files that you shared.

h-mayorquin avatar Jun 19 '25 17:06 h-mayorquin

Hi, here are the notebooks. Please let me know if anything else is unclear. I really appreciate your help and time to getting this standardized for the whitematter systems.HaoOn Jun 19, 2025, at 12:23 PM, Heberto Mayorquin @.***> wrote:h-mayorquin left a comment (SpikeInterface/spikeinterface#3964) Hi, thanks, actually the settings.xml do have a lot of info that we did not know was available for this format (such as the gains and the binary layout of the analog and digital pannels). I was not clear but when I was asking for the spec I was asking if you did not have a document (pdf or other format) where the format is described. This is usually provided by the manufacturer of the acquisition system. Also, I don't see the notebook in the files that you shared.

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>

lihao881230 avatar Jun 19 '25 18:06 lihao881230

Hi, here are the notebooks. Please let me know if anything else is unclear. I really appreciate your help and time to getting this standardized for the whitematter systems.HaoOn Jun 19, 2025, at 12:23 PM, Heberto Mayorquin @.***> wrote:h-mayorquin left a comment (SpikeInterface/spikeinterface#3964) Hi, thanks, actually the settings.xml do have a lot of info that we did not know was available for this format (such as the gains and the binary layout of the analog and digital pannels). I was not clear but when I was asking for the spec I was asking if you did not have a document (pdf or other format) where the format is described. This is usually provided by the manufacturer of the acquisition system. Also, I don't see the notebook in the files that you shared.

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>

lihao881230 avatar Jun 19 '25 18:06 lihao881230

Hey, I think that you are answering things on an email and probably attaching files but you actually need to access github and share the file here if you want us to be able to see it.

h-mayorquin avatar Jun 19 '25 22:06 h-mayorquin

Hi, here it is! Thank you!

si_whitematter.zip

lihao881230 avatar Jun 20 '25 04:06 lihao881230

Thanks a lot. I think we can come back here:

Do you happen to have an extractor available for the analog event channels as well (also in .bin format)? Or would it be best for me to load the event data manually using read_binary and compute the relevant event times myself?

I got the notebook and it looks like you're successfully loading the analog data using the extractor, which should work fine.

Just a couple of things to point out:

One issue is the sampling frequency. From the metadata files, it looks like the sampling rate is 25,000 Hz, but in your notebook you're using 30,000. Why is that?

The other issue is the gain. Right now the extractor assumes a default gain that works for the headstages:

https://github.com/SpikeInterface/spikeinterface/blob/2afae6077c2a0cb9fe6fd6cf8e98d09fcb921389/src/spikeinterface/extractors/whitematterrecordingextractor.py#L33-L34

But this assumption doesn’t apply to the analog panel channels. According to the metadata that I saw, the gains are different for the headstages. so using the default would scale your signals incorrectly by a large factor. I think that you are accounting for it on the threshold but just something to think on.

I think we should do two PRs:

  1. Expose a gain_to_uV argument in the extractor so the user can set the correct value manually.
  2. Add support for parsing the relevant fields from the metadata files (e.g. the JSON or XML) to automatically set the gain, sample rate, and channel count. For that, I’m wondering if we could add a short snippet of your data (e.g. a few seconds) to our test suite.

As for the TTL pulses: we can not build an extractor for timestamps because as you mention the TTLs were loaded to the analog entries of the box and from the metadata it does not look like there is a way of automatically determining that.

One question I had: is the segmented file layout a standard feature of the White Matter system? You have:

 Headstages_64_Channels_int16_2025-02-24_19-44-58.bin
 Headstages_64_Channels_int16_2025-02-24_19-49-59.bin
 Headstages_64_Channels_int16_2025-02-24_19-54-59.bin

As far as I can see, there's nothing in the metadata indicating segmentation or chunk timing, so I'm wondering how you know they’re contiguous. It seems to be the case from the timestamps, but I couldn’t find anything that documents it. Is that something that you set on the configuration of the system? I am asking because maybe we can concat automatically if there is a way of detecting this.

h-mayorquin avatar Jun 20 '25 18:06 h-mayorquin

Hi, thank you for reviewing my code and catching these errors! And yes, please feel free to use any of the data I uploaded. If necessary, we can collect additional data for this.

The segmentation duration can be set when we initiate the acquisition software, and it just happened to be set as 5 mins for these recordings. I don't know if it saves a log somewhere other than the file names, but I can check with the company on that. And I believe they are continuous.

On Jun 20, 2025, at 1:46 PM, Heberto Mayorquin @.***> wrote:

 h-mayorquin left a comment (SpikeInterface/spikeinterface#3964) https://github.com/SpikeInterface/spikeinterface/issues/3964#issuecomment-2992523054

Thanks a lot. I think we can come back here:

Do you happen to have an extractor available for the analog event channels as well (also in .bin format)? Or would it be best for me to load the event data manually using read_binary and compute the relevant event times myself?

I got the notebook and it looks like you're successfully loading the analog data using the extractor, which should work fine.

Just a couple of things to point out:

One issue is the sampling frequency. From the metadata files, it looks like the sampling rate is 25,000 Hz, but in your notebook you're using 30,000. Why is that?

The other issue is the gain. Right now the extractor assumes a default gain that works for the headstages:

https://github.com/SpikeInterface/spikeinterface/blob/2afae6077c2a0cb9fe6fd6cf8e98d09fcb921389/src/spikeinterface/extractors/whitematterrecordingextractor.py#L33-L34

But this assumption doesn’t apply to the analog panel channels. According to the metadata that I saw, the gains are different for the headstages. so using the default would scale your signals incorrectly by a large factor. I think that you are accounting for it on the threshold but just something to think on.

I think we should do two PRs:

  1. Expose a gain_to_uV argument in the extractor so the user can set the correct value manually.
  2. Add support for parsing the relevant fields from the metadata files (e.g. the JSON or XML) to automatically set the gain, sample rate, and channel count. For that, I’m wondering if we could add a short snippet of your data (e.g. a few seconds) to our test suite.

As for the TTL pulses: we can not build an extractor for timestamps because as you mention the TTLs were loaded to the analog entries of the box and from the metadata it does not look like there is a way of automatically determining that.

One question I had: is the segmented file layout a standard feature of the White Matter system? You have:

 Headstages_64_Channels_int16_2025-02-24_19-44-58.bin  Headstages_64_Channels_int16_2025-02-24_19-49-59.bin  Headstages_64_Channels_int16_2025-02-24_19-54-59.bin

As far as I can see, there's nothing in the metadata indicating segmentation or chunk timing, so I'm wondering how you know they’re contiguous. It seems to be the case from the timestamps, but I couldn’t find anything that documents it. Is that something that you set on the configuration of the system? I am asking because maybe we can concat automatically if there is a way of detecting this.

— Reply to this email directly, view it on GitHub https://github.com/SpikeInterface/spikeinterface/issues/3964#issuecomment-2992523054, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIBMEAUFJ3S64P7MUPFC3HD3ERJIZAVCNFSM6AAAAAB6KGMCWWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDSOJSGUZDGMBVGQ . You are receiving this because you were mentioned.Message ID: @.***>

lihao881230 avatar Jun 23 '25 16:06 lihao881230

Thanks! Yes, if you can confirm with the company whether there's a way to tell from the metadata or the binary if the data is chunked or segmented over time, that would be great.

Another question came to mind: previously, when we received samples of WhiteMatter, the users didn’t provide any of the XML files you have here. Do you know if those files are always generated, or are there situations where they might not be available?

h-mayorquin avatar Jun 23 '25 20:06 h-mayorquin

I made a PR so you can pass a different gain #4008.

I suspect the start of the offset is a unix timestamps relative to the start of the recording time. At least for the data that you shared, it matches. If that is the case, we could use that as some sort of check that the files are contiguous indeed by calculating estimate duration and that offset.

h-mayorquin avatar Jun 24 '25 04:06 h-mayorquin