Spikeglx: parse tcat input
Describe the bug
I am unable to read in my lf files when using SpikeGLXRawIO. The file structure is currently as below. When I had all the lf files be t0 instead of tcat it worked flawlessly.
ya008_20240525_g0
├── ya008_20240525_g0_ct_offsets.txt
├── ya008_20240525_g0_fyi.txt
├── ya008_20240525_g0_imec0
│ ├── ya008_20240525_g0_t0.imec0.ap.bin
│ ├── ya008_20240525_g0_t0.imec0.ap.meta
│ ├── ya008_20240525_g0_tcat.imec0.lf.bin
│ └── ya008_20240525_g0_tcat.imec0.lf.meta
├── ya008_20240525_g0_imec1
│ ├── ya008_20240525_g0_t0.imec1.ap.bin
│ ├── ya008_20240525_g0_t0.imec1.ap.meta
│ ├── ya008_20240525_g0_tcat.imec1.lf.bin
│ └── ya008_20240525_g0_tcat.imec1.lf.meta
├── ya008_20240525_g0_t0.nidq.bin
└── ya008_20240525_g0_t0.nidq.meta
To Reproduce
(neuroconv_312) ➜ test2 ipython
Python 3.12.3 | packaged by conda-forge | (main, Apr 15 2024, 18:38:13) [GCC 12.3.0]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.26.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: import neo.rawio
In [2]: reader = neo.rawio.SpikeGLXRawIO(dirname='./ya008_20240525_g0')
In [3]: reader.parse_header()
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
Cell In[3], line 1
----> 1 reader.parse_header()
File /scratch/ct5868/.conda/envs/neuroconv_312/lib/python3.12/site-packages/neo/rawio/baserawio.py:189, in BaseRawIO.parse_header(self)
176 """
177 Parses the header of the file(s) to allow for faster computations
178 for all other functions
179
180 """
181 # this must create
182 # self.header['nb_block']
183 # self.header['nb_segment']
(...)
186 # self.header['spike_channels']
187 # self.header['event_channels']
--> 189 self._parse_header()
190 self._check_stream_signal_channel_characteristics()
191 self.is_header_parsed = True
File /scratch/ct5868/.conda/envs/neuroconv_312/lib/python3.12/site-packages/neo/rawio/spikeglxrawio.py:147, in SpikeGLXRawIO._parse_header(self)
144 signal_channels = []
145 for stream_name in stream_names:
146 # take first segment
--> 147 info = self.signals_info_dict[0, stream_name]
149 stream_id = stream_name
150 stream_index = stream_names.index(info["stream_name"])
KeyError: (0, 'imec1.ap')
Environment:
- OS: Linux
- Python version: 3.12
- Neo version: 0.13.1
Related to #1501 #1390
I am happy to provide access to the meta files. The SpikeGLX data is rather large so if having access to that would be necessary, I'd have to ask IT how to best distribute that data.
@alejoe91 can you handle this ?
I think this should be fixed by this:
https://github.com/NeuralEnsemble/python-neo/pull/1608
It has been a while but any chance you could try this @tabedzki ?
@tabedzki just following up on this. The solution implemented in #1608 by @h-mayorquin should work here because files are parsed based on the original file name in the meta file. Can you test it on your end?
Thanks for your patience. Unfortunately, this does not address the underlying issue. I got the same error across other folders so it is not confined to this specific directory.
>>> import neo.rawio
>>> reader = neo.rawio.SpikeGLXRawIO(dirname='./jk44_08092024_g0')
>>> reader.parse_header()
Traceback (most recent call last):
File "<python-input-2>", line 1, in <module>
reader.parse_header()
~~~~~~~~~~~~~~~~~~~^^
File "/Users/ct5868/.venv/lib/python3.13/site-packages/neo/rawio/baserawio.py", line 211, in parse_header
self._parse_header()
~~~~~~~~~~~~~~~~~~^^
File "/Users/ct5868/.venv/lib/python3.13/site-packages/neo/rawio/spikeglxrawio.py", line 157, in _parse_header
info = self.signals_info_dict[0, stream_name]
~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^
KeyError: (0, 'imec0.ap')
>>> neo.__version__
'0.14.0'
tree jk44_08092024_g0
jk44_08092024_g0
├── jk44_08092024_g0_ct_offsets.txt
├── jk44_08092024_g0_fyi.txt
├── jk44_08092024_g0_imec0
│ ├── jk44_08092024_g0_t0.imec0.ap.bin
│ ├── jk44_08092024_g0_t0.imec0.ap.meta
│ ├── jk44_08092024_g0_tcat.imec0.lf.bin
│ └── jk44_08092024_g0_tcat.imec0.lf.meta
├── jk44_08092024_g0_t0.nidq.bin
└── jk44_08092024_g0_t0.nidq.meta
2 directories, 8 files
Can you print what is inside signals_info_dict?
We definitely need a more informative error there.
Thanks guys for looking into this more. Please find attached the dict requested signals_info_dict.txt
OK, I understand the bug. Can you share the data with us so I can create a stub for this and add it to the tests?
The problem is that you combine a tcat signal with one that is not. The tcat is always the segment 0 but the other signal (ap) is assigned segment one. The the line that is trigger the error is assuming that there is always a segment 0 for every stream.
The scan_files function should be modified somehow to take this into account (order segments per stream).
@samuelgarcia @alejoe91 how common is to have tcat lf but not ap? It is not clear to me how we should handle this.
I'd have to check with the researcher if I can share his data, though the data itself is large (95GB). What would be the recommended way to do that if I am granted permission?
Well, if you don't have acesss to an s3 bucket, dropbox, google drive, mega, etc with that capability then I have a script for stubing spikeglx data here:
https://gist.github.com/h-mayorquin/b5abe6ae8e8484976c2d27e0bb04e2a2
We will have to fine-tune for your case, we can do it in a 15 minutes short meeting.
It is important for us to know why is tcat only applied to the lp band and not to ap. Moreover, why is the gating mechanism (g0) preserved on the naming even after tcat. In short, a summary description of the dataset for provenance on our test data.
I seem to be facing a similar issue
Folder structure:
SNA-137704_mstim_20112024_S1_1_g0 ├── catgt_SNA-137704_mstim_20112024_S1_1_g0 │ ├── SNA-137704_mstim_20112024_S1_1_g0_ct_offsets.txt │ ├── SNA-137704_mstim_20112024_S1_1_g0_fyi.txt │ ├── SNA-137704_mstim_20112024_S1_1_g0_tcat.imec0.ap.bin │ ├── SNA-137704_mstim_20112024_S1_1_g0_tcat.imec0.ap.meta │ ├── SNA-137704_mstim_20112024_S1_1_g0_tcat.imec0.ap.xd_384_6_500.txt │ ├── SNA-137704_mstim_20112024_S1_1_g0_tcat.nidq.meta │ ├── SNA-137704_mstim_20112024_S1_1_g0_tcat.nidq.xd_1_0_500.txt │ └── SNA-137704_mstim_20112024_S1_1_g0_tcat.nidq.xd_1_1_200.txt ├── SNA-137704_mstim_20112024_S1_1_g0_imec0 │ ├── SNA-137704_mstim_20112024_S1_1_g0_t0.imec0.ap.bin │ └── SNA-137704_mstim_20112024_S1_1_g0_t0.imec0.ap.meta ├── SNA-137704_mstim_20112024_S1_1_g0_t0.nidq.bin └── SNA-137704_mstim_20112024_S1_1_g0_t0.nidq.meta
Code to reproduce
import spikeinterface.extractors as se
path = 'D:/nPixelData/SNA-137704/SNA-137704_mstim_20112024_S1_1_g0'
data = se.read_spikeglx(folder_path=path, stream_id='nidq')
# Also fails when running
# se.read_spikeglx(folder_path=path, stream_id='imec0')
# se.read_spikeglx(folder_path=path)
No matter which stream_id I define, the error remains the same:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
Cell In[3], [line 6](vscode-notebook-cell:?execution_count=3&line=6)
[4](vscode-notebook-cell:?execution_count=3&line=4) # Doesn't work
[5](vscode-notebook-cell:?execution_count=3&line=5) path = Path('D:/nPixelData/SNA-137704/SNA-137704_mstim_20112024_S1_1_g0')
----> [6](vscode-notebook-cell:?execution_count=3&line=6) data = se.read_spikeglx(folder_path=path, stream_id='imec0')
[8](vscode-notebook-cell:?execution_count=3&line=8) # Works in pre processing
[9](vscode-notebook-cell:?execution_count=3&line=9) # path = 'D:/nPixelData/SNA-137704/SNA-137704_mstim_20112024_S1_1_g0/catgt_SNA-137704_mstim_20112024_S1_1_g0'
[10](vscode-notebook-cell:?execution_count=3&line=10) data = se.read_spikeglx(path)
File c:\Users\miksc\anaconda3\envs\NP\Lib\site-packages\spikeinterface\extractors\neoextractors\spikeglx.py:65, in SpikeGLXRecordingExtractor.__init__(self, folder_path, load_sync_channel, stream_id, stream_name, all_annotations, use_names_as_ids)
[55](file:///C:/Users/miksc/anaconda3/envs/NP/Lib/site-packages/spikeinterface/extractors/neoextractors/spikeglx.py:55) def __init__(
[56](file:///C:/Users/miksc/anaconda3/envs/NP/Lib/site-packages/spikeinterface/extractors/neoextractors/spikeglx.py:56) self,
[57](file:///C:/Users/miksc/anaconda3/envs/NP/Lib/site-packages/spikeinterface/extractors/neoextractors/spikeglx.py:57) folder_path,
(...)
[62](file:///C:/Users/miksc/anaconda3/envs/NP/Lib/site-packages/spikeinterface/extractors/neoextractors/spikeglx.py:62) use_names_as_ids: bool = False,
[63](file:///C:/Users/miksc/anaconda3/envs/NP/Lib/site-packages/spikeinterface/extractors/neoextractors/spikeglx.py:63) ):
[64](file:///C:/Users/miksc/anaconda3/envs/NP/Lib/site-packages/spikeinterface/extractors/neoextractors/spikeglx.py:64) neo_kwargs = self.map_to_neo_kwargs(folder_path, load_sync_channel=load_sync_channel)
---> [65](file:///C:/Users/miksc/anaconda3/envs/NP/Lib/site-packages/spikeinterface/extractors/neoextractors/spikeglx.py:65) NeoBaseRecordingExtractor.__init__(
[66](file:///C:/Users/miksc/anaconda3/envs/NP/Lib/site-packages/spikeinterface/extractors/neoextractors/spikeglx.py:66) self,
[67](file:///C:/Users/miksc/anaconda3/envs/NP/Lib/site-packages/spikeinterface/extractors/neoextractors/spikeglx.py:67) stream_id=stream_id,
[68](file:///C:/Users/miksc/anaconda3/envs/NP/Lib/site-packages/spikeinterface/extractors/neoextractors/spikeglx.py:68) stream_name=stream_name,
[69](file:///C:/Users/miksc/anaconda3/envs/NP/Lib/site-packages/spikeinterface/extractors/neoextractors/spikeglx.py:69) all_annotations=all_annotations,
[70](file:///C:/Users/miksc/anaconda3/envs/NP/Lib/site-packages/spikeinterface/extractors/neoextractors/spikeglx.py:70) use_names_as_ids=use_names_as_ids,
[71](file:///C:/Users/miksc/anaconda3/envs/NP/Lib/site-packages/spikeinterface/extractors/neoextractors/spikeglx.py:71) **neo_kwargs,
...
--> [157](file:///C:/Users/miksc/anaconda3/envs/NP/Lib/site-packages/neo/rawio/spikeglxrawio.py:157) info = self.signals_info_dict[0, stream_name]
[159](file:///C:/Users/miksc/anaconda3/envs/NP/Lib/site-packages/neo/rawio/spikeglxrawio.py:159) buffer_id = stream_name
[160](file:///C:/Users/miksc/anaconda3/envs/NP/Lib/site-packages/neo/rawio/spikeglxrawio.py:160) buffer_name = stream_name
KeyError: (0, 'nidq')
Interestingly, I am able to load the catGT filtered data when I run:
path = 'D:/nPixelData/SNA-137704/SNA-137704_mstim_20112024_S1_1_g0/catgt_SNA-137704_mstim_20112024_S1_1_g0'
data = se.read_spikeglx(path)
Hopefully this is the right location for this issue. As I see g0 mentioned in the comment above I also removed the catgt folder's affix of '_g0' but this did not alter anything.
I am also willing to share any necessary additional information or data.
Hi h-mayorquin, Thanks for being willing to help and for providing a stubbing mechanism. Here is the stubbed data
Hi @tabedzki
The data that you shared with me does not have the binaries for the tcat processed lfp or nidq.
The script that I shared with you is a draft. I was expecting that we could take a look together andc finish it but now that I look at it again I think that you only need to modify the file_pattern: str = "*.ap.bin" argument in the create_data_stub function.
Note also, that you might need to modify the number of channels manually accordingo to your data.
Reach to me in slack if you need help with it.
@tabedzki, @samuelgarcia and I are wondering:
How come you only have the LF concatenated? We believe that typically, tcat is used for concatenation, so it’s confusing for us to see both concatenated and non-concatenated data in the same folder. Could tcat be used for something else? Do you happen to know?
@mik-schutte
Hi, your case is different since tcat is only present in one folder. Your case should work if you point the dirname of spikeglx rawio to the folder that either has tcat or does not. You can set dirname to either catgt_SNA-137704_mstim_20112024_S1_1_g0 or 137704_mstim_20112024_S1_1_g0_imec0, and that should work.
That said, we’d still like to understand why you have both tcat and non-tcat data at the same time. Is tcat used for something other than concatenation?
We need this context to make a good solution for this. We already have some ideas but we need your input.
Thanks for your solution, it has worked for me!
My tcat files were created through running catGT, which I use to extract noisy channels, filter the recording, extracting behavioural timestamps but most importantly perform global demuxing. This program was recommended by a lab member and I have not been able to obtain as good as a filtering with spikeinterface.
"""
created by Moritz for Larkum Lab
May 2024
"""
import os
import sys
import subprocess
import time
import shutil
from pathlib import Path
import numpy as np
def run_CatGT(directory, channels_to_exclude):
''' Configures the correct path and compiles commands that are fed into CatGT
NOTE: IF you want to make changes to the filtering you will have to adjust the strings below.
'''
# TODO this could be detected automatically
catGTPath = r'C:\Users\miksc\CatGT' # Desktop molecular-lab & laptop
# catGTPath = r'C:\Users\Mik\CatGT-win' # Desktop setup ABPF
if sys.platform.startswith('win'):
# build windows command line
catGTexe_fullpath = catGTPath.replace('\\', '/') + "/runit.bat"
elif sys.platform.starstwith('linux'):
catGTexe_fullpath = catGTPath.replace('\\', '/') + "/runit.sh"
else:
print('unknown system, cannot run CatGt')
# Format the channels to exclude
chnexcl = f'-chnexcl={{0;{",".join(map(str, channels_to_exclude))}}}' #NOTE this is now only formatted to remove channes from a single probe (probe 0) and does not work for multi probe
# Because I give the path with _g0 and then it wont run remove _g0
directory = str(directory).split('_g0')[0]
# Make sure that path comprehension can be run
directory = Path(directory)
# Define all commands that you wish to run on your raw data.
# The different type of commands can be found within the README file in your CatGT folder.
cmd_parts = list()
cmd_parts.append(catGTexe_fullpath)
cmd_parts.append('-dir=' + str(directory.parent))
cmd_parts.append('-run=' + str(directory.stem))
cmd_parts.append('-g=0') # this is because only type of g file (not multiple enable disable recording)
cmd_parts.append('-t=0,0') # only one 0 (because not a sequence of triggered file)
cmd_parts.append('-prb=0') # this means 1 probe only
# New part where we split the channels
# cmd_parts.append('-site=0:191')
# Based on lfp, which channels are out of the brain and should be excluded?
# cmd_parts.append('-chnexcl={0;127, 277}') # exclusion of channels of the CAR (channels not in brain)
cmd_parts.append(chnexcl)
cmd_parts.append('-prb_fld') # use folder per probe organization
cmd_parts.append('-ap') # process ap
# cmd_parts.append('-lf') # process lfp streams
cmd_parts.append('-ni')
cmd_parts.append('-apfilter=butter,12,300,9000') # bandpass filtering should always be implemented
# CAR or Globaldemuxing
# cmd_parts.append('-gblcar') #CAR common median referencing using all channels (recommneded one)
cmd_parts.append('-gbldmx')
#cmd_parts.append('-loccar_um=60,300') # either glbcar or loccar or gbldmx, loccar can be better if diff sampling tissue layers
#cmd_parts.append('-loccar_um=60,480')
#cmd_parts.append('-gfix=0.4,0.10,0.02') # make sense when used with gblcar, identify light or chewing artifacts
# cmd_parts.append('-gfix=0.4,0.10,0.02')
# extract pulse signal from digital chan (js,ip,word,bit,millisec) for NI js and ip are 0 Word is a zero-based channel index
cmd_parts.append('-xd=0,0,1,1,200,20') # This extracts the microstimulation trigger that arrives in NIDAQ (Digital channel 1)
# cmd_parts.append('-xd=0,0,0,1,200,20')
#cmd_parts.append('-xa=0')
#cmd_parts.append('-bf=0,0,1,1,6,3')
#cmd_parts.append('-sy=0,-1,6,500')
cmd_parts.append('-dest=' + str(directory)+'_g0') # destination path
catGT_cmd = ' ' # use space as the separator for the command parts
catGT_cmd = catGT_cmd.join(cmd_parts)
# Use all the different commands and run CatGT
print('CatGT command line:' + catGT_cmd)
start = time.time()
subprocess.call(catGT_cmd)
execution_time = time.time() - start
# copy CatGT log file, which will be in the directory with the calling
# python script, to the destination directory
logPath = catGTPath
logName = 'CatGT.log'
catgt_runName = 'catgt_' + directory.stem + '_g0'
# build name for log copy and copy files
catgt_logName = catgt_runName + '_CatGT.log'
catgt_runDir = directory.joinpath(catgt_runName)
shutil.copyfile(os.path.join(logPath,logName), \
os.path.join(directory,"\\",catgt_logName))
return execution_time