spikeinterface Help with phase_shift pre-processing with binary recording

I'm using Open Ephys' ONIX system to record my Neuropixel 1.0 recordings. This system spits out a binary file that I load with read_binary. I'm aware that the phase_shift pre-process requires a inter_sample_shift property in the recording that is automatically loaded from spikeglx and openephys read functions. In the absence of that, is there a way to generate/calculate this shift so that I can run the phase_shift on my data before using kilosort?

Mar 04 '25 01:03 DavidLucha

Hey @DavidLucha , there's a phase_shift function https://spikeinterface.readthedocs.io/en/latest/api.html#spikeinterface.preprocessing.phase_shift which you can access using something like

import spikeinterface.full as si
phase_shifted_recording = si.phase_shift(recording)

or

import spikeinterface.preprocessing as sipp
phase_shifted_recording = sipp.phase_shift(recording)

Many more details in the Analyze Neuropixels Recordings How To: https://spikeinterface.readthedocs.io/en/latest/how_to/analyze_neuropixels.html

Mar 04 '25 05:03 chrishalcrow

Hey @chrishalcrow that's what I've tried running based on the tutorial that you linked, but I don't have the inter_sample_shift property built into the binary file I'm loading (because I'm not reading spikeflx or openephys formats). Is there a way to run the phase_shift function without the inter_sample_shift property?

Mar 04 '25 05:03 DavidLucha

Oops, sorry, I'm bad at reading...

So, we just use the get_neuropixels_sample_shifts function which can be found here: src/spikeinterface/extractors/neuropixels_utils.py

So you might be able to use something like

from spikeinterface.extractors.neuropixels_utils import get_neuropixels_sample_shifts

total_channels = 384
num_channels_per_adc = 12
num_channels_in_adc = 12 # or 13 if you have ap stream

sample_shifts = get_neuropixels_sample_shifts(total_channels, num_channels_per_adc, num_channels_in_adc)

recording.set_property('inter_sample_shift', sample_shifts)

following the logic of this code here: https://github.com/SpikeInterface/spikeinterface/blob/508de1e306747dcf1af708834f59923ad3ce712f/src/spikeinterface/extractors/neoextractors/openephys.py#L217

I'd be a little worried ONIX does something slightly different with the the data or probe. As a check, here's a csv with the inter_sample_shifts and the (x,y) coordinates of a NP1.0 probe that I have, maybe you can compare?

np_sample_shifts.csv

Mar 04 '25 06:03 chrishalcrow

@DavidLucha do you mind sharing a sample ONIX dataset with us? I think we need a proper extractor for it, so that all metadata can also be handled properly

Mar 04 '25 17:03 alejoe91

@chrishalcrow thanks for that. I'll give this a shot in the coming days and will check back in once I have something to compare with.

@alejoe91 I can definitely share a dataset! I don't have anything short and small to share currently and my university has just been shut down indefinitely as it seems we're about to get hit by a cyclone, but as soon as I'm back on campus early next week I'll share a dataset that is saved from the exemplar workflow found here.

Mar 04 '25 22:03 DavidLucha

Hi all, glad to hear that ONIX might get a data extractor, that would be nice. I can answer any questions about this. ONIX data is a flat binary file. Its created in bonsai, so the metadata is going to be the .bonsai file that generated the data.

I have a comment/question about the whole premise of this question though and these utility and correctness of these phase offset corrections in general. Neuropixels have anti-aliasing filters prior the their ADCs. Given that these filters should, ideally, remove any power beyond Fs/2, why is there a step needed to perform temporal corrections at frequencies well beyond Fs, like these inter-sample phase corrections? This information should be lost in the analog domain prior to sampling.

Mar 05 '25 16:03 jonnew

Thanks @jonnew

So even if there is a hardware filter, different channels are actually sampled at different times, since each ADC sweeps through a few channels within each sampling cycle. For artifacts that appear across the probe depths, it's beend demonstrated that without correcting for these phase shifts you don't properly remove these artifacts.

You can check this paper from IBL with some demonstration plots (Figure 3): https://figshare.com/articles/online_resource/Spike_sorting_pipeline_for_the_International_Brain_Laboratory/19705522?file=49783080

Mar 05 '25 16:03 alejoe91

Yep i understand that, and I understand there may be an effect. However -- a proper AA filter will remove this power from the signal before being sample, even non-simultaneously converted so long as a full round robin occurs with in a single 1/Fs time period! There should be nothing to correct if there is no power about Fs/2, but, clearly there is power above Fs/2 if these phase shifts do something.

I guess phase shifts could be introduce in the analog domain prior to sampling due to delays in their sample and hold. This might explain things. But its not due to the round robin ADC conversion itself or, at least, it really should not be.

The basic critique is that for a system like this, there should be no information in the signal above frequencies > Fs/2.

Mar 05 '25 17:03 jonnew

In the link you provided, the signal is known beforehand to be produced by a fast rising edge applied to all channels. By doing phase alignment on this and have the results "look better" at the end requires knowledge of the signal that went into all channels on the probe. You do not have this information when the probe is in the brain. To be concrete, lets ask the following question: using only information provided by one channel at a time, in which sample, red or green, did the rising edge physically occur at the output of the function generator?

The answer is that we cannot know based on the data alone. Therefore our decision to realign data to one sample to recreate the rising edge that we have a-priori knowledge of, is arbitrary.

I might be missing some information that comes from the a signal that is probe-wide, like this one, and is therefore super sampled in some way because its occurring on all electrodes, which is a way out of this as well i guess.

And, FWIW, I might just be wrong here. I guess if I'm right, which it kinda seems like im not given that there is a positive effect of doing this for drift correction, then this step should not do much of anything anyway.

Mar 05 '25 17:03 jonnew

@DavidLucha do you mind sharing a sample ONIX dataset with us? I think we need a proper extractor for it, so that all metadata can also be handled properly

Hi @alejoe91

Sorry it took so long, but I've attached a link to a zip with an example dataset from the Onix Neuropixel v1e Headstage as defined in the Onix documentation: https://open-ephys.github.io/bonsai-onix1-docs/articles/hardware/np1e/overview.html

Let me know if you need anything else or if there's any issues with that link/if there's a better way to share that data!

https://drive.google.com/file/d/1klAe4vUwn6rTAZmz5KQnrPJzR54dXHlU/view?usp=sharing

Mar 14 '25 01:03 DavidLucha

Maybe this is useful, I have a little func to extract infos from the bonsai file, for NPX1 it is useful to get the LFP and AP gain without hardcoding them

import xml.etree.ElementTree as ET

def extract_bonsai_configuration(bonsai_file):

    data= {}
    with open(bonsai_file, "r", encoding="utf-8") as f:
            for event, elem in ET.iterparse(f, events=("start",)):
                for child in elem.iter():
                    tag_name = child.tag.split("}")[-1]
                    text_value = child.text.strip() if child.text else "None"
                    data[tag_name] = text_value

    return data

ap_gain = int(bonsai_config['SpikeAmplifierGain'].replace("Gain", "").strip())
lfp_gain = int(bonsai_config['LfpAmplifierGain'].replace("Gain", "").strip())

Apr 11 '25 11:04 RobertoDF