python-neo Spikeglx: parse tcat input

Describe the bug I am unable to read in my lf files when using SpikeGLXRawIO. The file structure is currently as below. When I had all the lf files be t0 instead of tcat it worked flawlessly.

ya008_20240525_g0
├── ya008_20240525_g0_ct_offsets.txt
├── ya008_20240525_g0_fyi.txt
├── ya008_20240525_g0_imec0
│  ├── ya008_20240525_g0_t0.imec0.ap.bin
│  ├── ya008_20240525_g0_t0.imec0.ap.meta
│  ├── ya008_20240525_g0_tcat.imec0.lf.bin
│  └── ya008_20240525_g0_tcat.imec0.lf.meta
├── ya008_20240525_g0_imec1
│  ├── ya008_20240525_g0_t0.imec1.ap.bin
│  ├── ya008_20240525_g0_t0.imec1.ap.meta
│  ├── ya008_20240525_g0_tcat.imec1.lf.bin
│  └── ya008_20240525_g0_tcat.imec1.lf.meta
├── ya008_20240525_g0_t0.nidq.bin
└── ya008_20240525_g0_t0.nidq.meta

To Reproduce

(neuroconv_312) ➜  test2 ipython
Python 3.12.3 | packaged by conda-forge | (main, Apr 15 2024, 18:38:13) [GCC 12.3.0]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.26.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import neo.rawio

In [2]: reader = neo.rawio.SpikeGLXRawIO(dirname='./ya008_20240525_g0')

In [3]: reader.parse_header()
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[3], line 1
----> 1 reader.parse_header()

File /scratch/ct5868/.conda/envs/neuroconv_312/lib/python3.12/site-packages/neo/rawio/baserawio.py:189, in BaseRawIO.parse_header(self)
    176 """
    177 Parses the header of the file(s) to allow for faster computations
    178 for all other functions
    179
    180 """
    181 # this must create
    182 # self.header['nb_block']
    183 # self.header['nb_segment']
   (...)
    186 # self.header['spike_channels']
    187 # self.header['event_channels']
--> 189 self._parse_header()
    190 self._check_stream_signal_channel_characteristics()
    191 self.is_header_parsed = True

File /scratch/ct5868/.conda/envs/neuroconv_312/lib/python3.12/site-packages/neo/rawio/spikeglxrawio.py:147, in SpikeGLXRawIO._parse_header(self)
    144 signal_channels = []
    145 for stream_name in stream_names:
    146     # take first segment
--> 147     info = self.signals_info_dict[0, stream_name]
    149     stream_id = stream_name
    150     stream_index = stream_names.index(info["stream_name"])

KeyError: (0, 'imec1.ap')

Environment:

OS: Linux
Python version: 3.12
Neo version: 0.13.1

Related to #1501 #1390

I am happy to provide access to the meta files. The SpikeGLX data is rather large so if having access to that would be necessary, I'd have to ask IT how to best distribute that data.

Jul 24 '24 17:07 tabedzki

@alejoe91 can you handle this ?

Jul 26 '24 14:07 samuelgarcia

I think this should be fixed by this:

https://github.com/NeuralEnsemble/python-neo/pull/1608

It has been a while but any chance you could try this @tabedzki ?

Dec 10 '24 17:12 h-mayorquin

@tabedzki just following up on this. The solution implemented in #1608 by @h-mayorquin should work here because files are parsed based on the original file name in the meta file. Can you test it on your end?

Jan 08 '25 10:01 alejoe91

Thanks for your patience. Unfortunately, this does not address the underlying issue. I got the same error across other folders so it is not confined to this specific directory.

>>> import neo.rawio
>>> reader = neo.rawio.SpikeGLXRawIO(dirname='./jk44_08092024_g0')
>>> reader.parse_header()
Traceback (most recent call last):
  File "<python-input-2>", line 1, in <module>
    reader.parse_header()
    ~~~~~~~~~~~~~~~~~~~^^
  File "/Users/ct5868/.venv/lib/python3.13/site-packages/neo/rawio/baserawio.py", line 211, in parse_header
    self._parse_header()
    ~~~~~~~~~~~~~~~~~~^^
  File "/Users/ct5868/.venv/lib/python3.13/site-packages/neo/rawio/spikeglxrawio.py", line 157, in _parse_header
    info = self.signals_info_dict[0, stream_name]
           ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^
KeyError: (0, 'imec0.ap')
>>> neo.__version__
'0.14.0'

tree jk44_08092024_g0
jk44_08092024_g0
├── jk44_08092024_g0_ct_offsets.txt
├── jk44_08092024_g0_fyi.txt
├── jk44_08092024_g0_imec0
│   ├── jk44_08092024_g0_t0.imec0.ap.bin
│   ├── jk44_08092024_g0_t0.imec0.ap.meta
│   ├── jk44_08092024_g0_tcat.imec0.lf.bin
│   └── jk44_08092024_g0_tcat.imec0.lf.meta
├── jk44_08092024_g0_t0.nidq.bin
└── jk44_08092024_g0_t0.nidq.meta

2 directories, 8 files

Jan 23 '25 01:01 tabedzki

Can you print what is inside signals_info_dict?

We definitely need a more informative error there.

Jan 23 '25 02:01 h-mayorquin

Thanks guys for looking into this more. Please find attached the dict requested signals_info_dict.txt

Jan 23 '25 20:01 tabedzki

OK, I understand the bug. Can you share the data with us so I can create a stub for this and add it to the tests?

The problem is that you combine a tcat signal with one that is not. The tcat is always the segment 0 but the other signal (ap) is assigned segment one. The the line that is trigger the error is assuming that there is always a segment 0 for every stream.

The scan_files function should be modified somehow to take this into account (order segments per stream).

@samuelgarcia @alejoe91 how common is to have tcat lf but not ap? It is not clear to me how we should handle this.

Jan 23 '25 21:01 h-mayorquin

I'd have to check with the researcher if I can share his data, though the data itself is large (95GB). What would be the recommended way to do that if I am granted permission?

Jan 23 '25 22:01 tabedzki

Well, if you don't have acesss to an s3 bucket, dropbox, google drive, mega, etc with that capability then I have a script for stubing spikeglx data here:

https://gist.github.com/h-mayorquin/b5abe6ae8e8484976c2d27e0bb04e2a2

We will have to fine-tune for your case, we can do it in a 15 minutes short meeting.

It is important for us to know why is tcat only applied to the lp band and not to ap. Moreover, why is the gating mechanism (g0) preserved on the naming even after tcat. In short, a summary description of the dataset for provenance on our test data.

Jan 27 '25 13:01 h-mayorquin

I seem to be facing a similar issue

Folder structure:

SNA-137704_mstim_20112024_S1_1_g0 ├── catgt_SNA-137704_mstim_20112024_S1_1_g0 │ ├── SNA-137704_mstim_20112024_S1_1_g0_ct_offsets.txt │ ├── SNA-137704_mstim_20112024_S1_1_g0_fyi.txt │ ├── SNA-137704_mstim_20112024_S1_1_g0_tcat.imec0.ap.bin │ ├── SNA-137704_mstim_20112024_S1_1_g0_tcat.imec0.ap.meta │ ├── SNA-137704_mstim_20112024_S1_1_g0_tcat.imec0.ap.xd_384_6_500.txt │ ├── SNA-137704_mstim_20112024_S1_1_g0_tcat.nidq.meta │ ├── SNA-137704_mstim_20112024_S1_1_g0_tcat.nidq.xd_1_0_500.txt │ └── SNA-137704_mstim_20112024_S1_1_g0_tcat.nidq.xd_1_1_200.txt ├── SNA-137704_mstim_20112024_S1_1_g0_imec0 │ ├── SNA-137704_mstim_20112024_S1_1_g0_t0.imec0.ap.bin │ └── SNA-137704_mstim_20112024_S1_1_g0_t0.imec0.ap.meta ├── SNA-137704_mstim_20112024_S1_1_g0_t0.nidq.bin └── SNA-137704_mstim_20112024_S1_1_g0_t0.nidq.meta

Code to reproduce

import spikeinterface.extractors as se
path = 'D:/nPixelData/SNA-137704/SNA-137704_mstim_20112024_S1_1_g0'
data = se.read_spikeglx(folder_path=path, stream_id='nidq')

# Also fails when running 
# se.read_spikeglx(folder_path=path, stream_id='imec0')
# se.read_spikeglx(folder_path=path)

No matter which stream_id I define, the error remains the same:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[3], [line 6](vscode-notebook-cell:?execution_count=3&line=6)
      [4](vscode-notebook-cell:?execution_count=3&line=4) # Doesn't work
      [5](vscode-notebook-cell:?execution_count=3&line=5) path = Path('D:/nPixelData/SNA-137704/SNA-137704_mstim_20112024_S1_1_g0')
----> [6](vscode-notebook-cell:?execution_count=3&line=6) data = se.read_spikeglx(folder_path=path, stream_id='imec0')
      [8](vscode-notebook-cell:?execution_count=3&line=8) # Works in pre processing
      [9](vscode-notebook-cell:?execution_count=3&line=9) # path = 'D:/nPixelData/SNA-137704/SNA-137704_mstim_20112024_S1_1_g0/catgt_SNA-137704_mstim_20112024_S1_1_g0'
     [10](vscode-notebook-cell:?execution_count=3&line=10) data = se.read_spikeglx(path)

File c:\Users\miksc\anaconda3\envs\NP\Lib\site-packages\spikeinterface\extractors\neoextractors\spikeglx.py:65, in SpikeGLXRecordingExtractor.__init__(self, folder_path, load_sync_channel, stream_id, stream_name, all_annotations, use_names_as_ids)
     [55](file:///C:/Users/miksc/anaconda3/envs/NP/Lib/site-packages/spikeinterface/extractors/neoextractors/spikeglx.py:55) def __init__(
     [56](file:///C:/Users/miksc/anaconda3/envs/NP/Lib/site-packages/spikeinterface/extractors/neoextractors/spikeglx.py:56)     self,
     [57](file:///C:/Users/miksc/anaconda3/envs/NP/Lib/site-packages/spikeinterface/extractors/neoextractors/spikeglx.py:57)     folder_path,
   (...)
     [62](file:///C:/Users/miksc/anaconda3/envs/NP/Lib/site-packages/spikeinterface/extractors/neoextractors/spikeglx.py:62)     use_names_as_ids: bool = False,
     [63](file:///C:/Users/miksc/anaconda3/envs/NP/Lib/site-packages/spikeinterface/extractors/neoextractors/spikeglx.py:63) ):
     [64](file:///C:/Users/miksc/anaconda3/envs/NP/Lib/site-packages/spikeinterface/extractors/neoextractors/spikeglx.py:64)     neo_kwargs = self.map_to_neo_kwargs(folder_path, load_sync_channel=load_sync_channel)
---> [65](file:///C:/Users/miksc/anaconda3/envs/NP/Lib/site-packages/spikeinterface/extractors/neoextractors/spikeglx.py:65)     NeoBaseRecordingExtractor.__init__(
     [66](file:///C:/Users/miksc/anaconda3/envs/NP/Lib/site-packages/spikeinterface/extractors/neoextractors/spikeglx.py:66)         self,
     [67](file:///C:/Users/miksc/anaconda3/envs/NP/Lib/site-packages/spikeinterface/extractors/neoextractors/spikeglx.py:67)         stream_id=stream_id,
     [68](file:///C:/Users/miksc/anaconda3/envs/NP/Lib/site-packages/spikeinterface/extractors/neoextractors/spikeglx.py:68)         stream_name=stream_name,
     [69](file:///C:/Users/miksc/anaconda3/envs/NP/Lib/site-packages/spikeinterface/extractors/neoextractors/spikeglx.py:69)         all_annotations=all_annotations,
     [70](file:///C:/Users/miksc/anaconda3/envs/NP/Lib/site-packages/spikeinterface/extractors/neoextractors/spikeglx.py:70)         use_names_as_ids=use_names_as_ids,
     [71](file:///C:/Users/miksc/anaconda3/envs/NP/Lib/site-packages/spikeinterface/extractors/neoextractors/spikeglx.py:71)         **neo_kwargs,
...
--> [157](file:///C:/Users/miksc/anaconda3/envs/NP/Lib/site-packages/neo/rawio/spikeglxrawio.py:157)     info = self.signals_info_dict[0, stream_name]
    [159](file:///C:/Users/miksc/anaconda3/envs/NP/Lib/site-packages/neo/rawio/spikeglxrawio.py:159)     buffer_id = stream_name
    [160](file:///C:/Users/miksc/anaconda3/envs/NP/Lib/site-packages/neo/rawio/spikeglxrawio.py:160)     buffer_name = stream_name

KeyError: (0, 'nidq')

Interestingly, I am able to load the catGT filtered data when I run:

path = 'D:/nPixelData/SNA-137704/SNA-137704_mstim_20112024_S1_1_g0/catgt_SNA-137704_mstim_20112024_S1_1_g0'
data = se.read_spikeglx(path)

Hopefully this is the right location for this issue. As I see g0 mentioned in the comment above I also removed the catgt folder's affix of '_g0' but this did not alter anything.

I am also willing to share any necessary additional information or data.

Jan 31 '25 00:01 mik-schutte

Hi h-mayorquin, Thanks for being willing to help and for providing a stubbing mechanism. Here is the stubbed data

Jan 31 '25 02:01 tabedzki

Hi @tabedzki

The data that you shared with me does not have the binaries for the tcat processed lfp or nidq. The script that I shared with you is a draft. I was expecting that we could take a look together andc finish it but now that I look at it again I think that you only need to modify the file_pattern: str = "*.ap.bin" argument in the create_data_stub function.

Note also, that you might need to modify the number of channels manually accordingo to your data.

Reach to me in slack if you need help with it.

Feb 05 '25 15:02 h-mayorquin

@tabedzki, @samuelgarcia and I are wondering:

How come you only have the LF concatenated? We believe that typically, tcat is used for concatenation, so it’s confusing for us to see both concatenated and non-concatenated data in the same folder. Could tcat be used for something else? Do you happen to know?

@mik-schutte

Hi, your case is different since tcat is only present in one folder. Your case should work if you point the dirname of spikeglx rawio to the folder that either has tcat or does not. You can set dirname to either catgt_SNA-137704_mstim_20112024_S1_1_g0 or 137704_mstim_20112024_S1_1_g0_imec0, and that should work.

That said, we’d still like to understand why you have both tcat and non-tcat data at the same time. Is tcat used for something other than concatenation?

We need this context to make a good solution for this. We already have some ideas but we need your input.

Feb 05 '25 16:02 h-mayorquin

Thanks for your solution, it has worked for me!

My tcat files were created through running catGT, which I use to extract noisy channels, filter the recording, extracting behavioural timestamps but most importantly perform global demuxing. This program was recommended by a lab member and I have not been able to obtain as good as a filtering with spikeinterface.

"""
created by Moritz for Larkum Lab 
May 2024
"""

import os
import sys
import subprocess
import time
import shutil
from pathlib import Path
import numpy as np

def run_CatGT(directory, channels_to_exclude):
    ''' Configures the correct path and compiles commands that are fed into CatGT
    NOTE: IF you want to make changes to the filtering you will have to adjust the strings below.
    '''
    # TODO this could be detected automatically 
    catGTPath = r'C:\Users\miksc\CatGT' # Desktop molecular-lab & laptop
    # catGTPath = r'C:\Users\Mik\CatGT-win' # Desktop setup ABPF

    if sys.platform.startswith('win'):
        # build windows command line
        catGTexe_fullpath = catGTPath.replace('\\', '/') + "/runit.bat"
    elif sys.platform.starstwith('linux'):
        catGTexe_fullpath = catGTPath.replace('\\', '/') + "/runit.sh"
    else:
        print('unknown system, cannot run CatGt')

    # Format the channels to exclude
    chnexcl = f'-chnexcl={{0;{",".join(map(str, channels_to_exclude))}}}' #NOTE this is now only formatted to remove channes from a single probe (probe 0) and does not work for multi probe

    # Because I give the path with _g0 and then it wont run remove _g0
    directory = str(directory).split('_g0')[0]

    # Make sure that path comprehension can be run 
    directory = Path(directory)
    # Define all commands that you wish to run on your raw data. 
    #  The different type of commands can be found within the README file in your CatGT folder.
    cmd_parts = list()
    cmd_parts.append(catGTexe_fullpath)
    cmd_parts.append('-dir=' + str(directory.parent))
    cmd_parts.append('-run=' + str(directory.stem))
    cmd_parts.append('-g=0') # this is because only type of g file (not multiple enable disable recording)
    cmd_parts.append('-t=0,0') # only one 0 (because not a sequence of triggered file)
    cmd_parts.append('-prb=0') # this means 1 probe only

    # New part where we split the channels
    # cmd_parts.append('-site=0:191')

    # Based on lfp, which channels are out of the brain and should be excluded?
    # cmd_parts.append('-chnexcl={0;127, 277}') # exclusion of channels of the CAR (channels not in brain)
    cmd_parts.append(chnexcl)

    cmd_parts.append('-prb_fld')  # use folder per probe organization
    cmd_parts.append('-ap') # process ap
    # cmd_parts.append('-lf') # process lfp streams 
    cmd_parts.append('-ni') 
    cmd_parts.append('-apfilter=butter,12,300,9000') # bandpass filtering should always be implemented

    # CAR or Globaldemuxing
    # cmd_parts.append('-gblcar') #CAR common median referencing using all channels (recommneded  one)
    cmd_parts.append('-gbldmx')


    #cmd_parts.append('-loccar_um=60,300') # either glbcar or loccar or gbldmx, loccar can be better if diff sampling tissue layers
    #cmd_parts.append('-loccar_um=60,480')
    #cmd_parts.append('-gfix=0.4,0.10,0.02') # make sense when used with gblcar, identify light or chewing artifacts
    # cmd_parts.append('-gfix=0.4,0.10,0.02')

    # extract pulse signal from digital chan (js,ip,word,bit,millisec) for NI js and ip are 0 Word is a zero-based channel index
    cmd_parts.append('-xd=0,0,1,1,200,20') # This extracts the microstimulation trigger that arrives in NIDAQ (Digital channel 1)
    # cmd_parts.append('-xd=0,0,0,1,200,20')   

    #cmd_parts.append('-xa=0')
    #cmd_parts.append('-bf=0,0,1,1,6,3')
    #cmd_parts.append('-sy=0,-1,6,500')
    cmd_parts.append('-dest=' + str(directory)+'_g0') # destination path
    catGT_cmd = ' '        # use space as the separator for the command parts
    catGT_cmd = catGT_cmd.join(cmd_parts)
    
    # Use all the different commands and run CatGT
    print('CatGT command line:' + catGT_cmd)
    start = time.time()
    subprocess.call(catGT_cmd)
    execution_time = time.time() - start

    # copy CatGT log file, which will be in the directory with the calling 
    # python script, to the destination directory
    logPath = catGTPath
    logName = 'CatGT.log'
    catgt_runName = 'catgt_' + directory.stem + '_g0'

    # build name for log copy and copy files
    catgt_logName = catgt_runName + '_CatGT.log'
    catgt_runDir = directory.joinpath(catgt_runName)
    shutil.copyfile(os.path.join(logPath,logName), \
                    os.path.join(directory,"\\",catgt_logName))

    return execution_time

Feb 11 '25 15:02 mik-schutte