spikeinterface
spikeinterface copied to clipboard
Unit locations discrepancy between SI and PHY
Hi,
I am seeing some units in Phy that have locations in the shank not where SI is showing when I run sw.plot_unit_locations(sa,width_cm =2, backend ="ipywidgets"). I am curious if this is a known problem or something rare and in cases when this happens what is the approach as far as which one to trust?
I am happy to share pictures if needed.
Thanks!
Pictures would be nice (between Phy and SI) for me. I would like to see what is actually happening :)
Here you go. examples of one of the many I found.
@taningh86 phy IDs are not SI IDs! The reason is that phy needs integers. You should have a si_unit_id column in the phy cluster view ;)
This has come up multiple times. Let's not close this until we have a document PR linked where we emphasize that they are not the same :)
@taningh86 phy IDs are not SI IDs! The reason is that phy needs integers. You should have a si_unit_id column in the phy cluster view ;)
Hi @alejoe91 I don't see one. I have attached a pic of it here. This is the only window I see in Phy for clusterview. Is this you are talking about?
Do you have .tsv file called like cluster_info.tsv? If you look in there there should be a column. Maybe it is not displaying?
You mean in the Phy folder or the kilosort output folder? I don't see a cluster_info.tsv file in the Phy folder, neither in the sorter_output folder.
But it should be there in the sorter output folder since we need it to assign spiketrain and other info to the right clusters, isn't? Is cluster_group.tsv the same?
You are just doing this from kilosort natively. Ie you never ran export_to_phy?
Ah, that may be the issue! yes, I am doing it directly from kilosort. My bad! Let me try that.
It should still work though. How are you reading the kilosort units back into spikeinterface? se.read_kilosort or se.read_phy?
I was running everything in spikeinterface but recently started using Phy. I also tried the manual curation in spikeinterface. So, I don't think I am reading it back. To read sorted data (which was sorted in docker) I use 'ss.read_sorter_folder(base_folder/"results_KS25").
It seems to work after exporting it to phy. Unit locations seems to match. However, correlograms look different. I can't think of a reason why they would change since they are just exported from SI and it should look the same, isn't?
Also, the KSLabel column that says 'good' or 'mua' is not there anymore. Is there a way we can still have it?
But, still curious why the unit locations or ID's should change, whether I export data from spikeinterface to phy or just run it directly on the kilosort output folder as @zm711 also said.
Maybe @alejoe91 has a more elegant solution, but you could just copy the info from the KSlabel.tsv into a column of the cluster_info.tsv and it should display. It's been a while since I've played with exporting to phy... (even just copying the file into the folder should work, but then it might not update if you're mixing and matching curation). Since the KSlabel is just a kilosort thing, it won't be automatically generated by export_to_phy since people using other sorters might not have that info :)
Correlograms could look slightly different because the correlograms aren't exported. Phy calculates its own correlograms inside the gui itself. So the way it displays that info would likely be slightly different. So you would have to check how phy calculates to compare to how SI calculates to see what the differences are.
As far as unit locations/IDs I think I would need to think more about it. Would you be able to post your analysis script here so we can see if any operations may have been performed that would change numbering?
Sure @zm711 Here is my code. Its pretty similar to the code in the tutorial (oct 2022) until the sorting stage. Then postprocessing codes are mostly from the documentation and slightly modified for my use.
import spikeinterface as si
import spikeinterface.extractors as se
import spikeinterface.preprocessing as spre
import spikeinterface.sorters as ss
import spikeinterface.postprocessing as spost
import spikeinterface.qualitymetrics as sqm
import spikeinterface.comparison as sc
import spikeinterface.exporters as sexp
import spikeinterface.widgets as sw
import matplotlib.pyplot as plt
import numpy as np
from pathlib import Path
print(si.__version__)
import warnings
warnings.simplefilter("ignore")
%matplotlib widget
import os
%pwd
os.chdir("G:/Cat_GT_Out/catgt_NPX2_2_28_24_2Rght_LHA_RSP_EXP_g0")
%pwd
base_folder = "G:/Cat_GT_Out/catgt_NPX2_2_28_24_2Rght_LHA_RSP_EXP_g0"
recording_spikeglx = se.read_spikeglx(folder_path = base_folder, stream_id = 'imec0.ap', all_annotations = False)
recording_spikeglx
recording_spikeglx.annotate(is_filtered=False)
channel_ids = recording_spikeglx.get_channel_ids()
fs = recording_spikeglx.get_sampling_frequency()
num_chan = recording_spikeglx.get_num_channels()
num_segments = recording_spikeglx.get_num_segments()
recording_to_process = recording_spikeglx
recording_cmr = spre.common_reference(recording_to_process, reference='global', operator='median')
w = sw.plot_timeseries({"raw": recording_to_process, "common": recording_cmr},
clim=(-50, 50), time_range=[100, 110], order_channel_by_depth=True,
backend="ipywidgets") #only includes CMR and NO bandpass_filter. So its slightly modified from the code in tutorial
fs = recording_cmr.get_sampling_frequency()
recording_sub = recording_cmr.frame_slice(start_frame=0*fs, end_frame=1821*fs)
recording_sub
job_kwargs = dict(n_jobs=10, chunk_duration="1s", progress_bar=True) #this one I need to understand
base_folder = Path(base_folder)
recording_saved = si.load_extractor(base_folder / "preprocessed")
recording_saved = si.load_extractor(base_folder / "preprocessed")
recording_loaded = si.load_extractor(base_folder / "preprocessed")
w = sw.plot_timeseries({"preprocessed": recording_sub, "saved": recording_saved, "loaded": recording_loaded},
clim=(-50, 50), mode="line",
backend="ipywidgets")
sorter_params = job_kwargs
sorting_KS25 = ss.run_sorter('kilosort2_5', recording_saved,
output_folder=base_folder / 'results_KS25',
verbose=True, docker_image=True,
**sorter_params)
sorting_KS25 = ss.read_sorter_folder(base_folder/"results_KS25")
sorting_KS25
recording_saved = si.load_extractor(base_folder / "preprocessed")
sorting = sorting_KS25
print(sorting)
sorting_analyzer = si.create_sorting_analyzer(
sorting=sorting,
recording=recording_saved,
format='binary_folder',
folder= 'sorting_analyzer'
)
sa = sorting_analyzer
job_kwargs = dict(n_jobs=8, chunk_duration="1s", progress_bar=True)
sorting_analyzer.compute(
"random_spikes",
method="uniform",
max_spikes_per_unit=500,
)
sa.compute("waveforms", ms_before=1.0, ms_after=2.0, return_scaled=True, **job_kwargs)
sa.compute("templates", operators=["average", "median", "std"])
sa.compute('spike_amplitudes', **job_kwargs)
sa.compute('unit_locations', method="monopolar_triangulation")
sa.compute('spike_locations', method="center_of_mass", **job_kwargs)
sa.compute('correlograms')
sa.compute('template_similarity')
print(sorting_analyzer)
After this its mostly widgets, template and QM modules.
Hi @zm711 Have you ha a chance to look at the code above? I am curious why would SI unit locations be any different than Phy since it works on the kilosort data. I am still finding lot of units with different site locations between the two.
Well the sites could be slightly different! I want to emphasize that first. You get to choose your unit_location functionality (you chose to do monopolar_triangulation. so this could shift you slightly. I'm not sure though. let me tag @alejoe91 back into this and see if he has any ideas.
Ok. I agree that sites could be slightly different along the same vicinity with the computation I chose. However, I am seeing locations change drastically, about 4000 micron along the shank between SI and Phy. I am happy to share more pictures for comparisons of multiple sites I found that jumped location quite far along the shank.
Did you apply any postprocessing to the la output, e.g. removing redundant units? I agree that this is weird. Can you check that the waveforms/templates look similare between phy and si?
Yeah, waveforms looks similar between the units and as far as I know I have not done any removal of redundant units. I am away from the analysis computer currently but will get back later today when i am at it. P.S I code I shared above is what I did with this dataset. I have done removal of redundant units before with other datasets but have done those always after sorting (mostly to check).
Hi @alejoe91 Sorry i could not get access to the analysis computer later half of the day yesterday. I ran Phy again on another dataset and made sure no removal of redundant units were run. I am still getting wide variation in spike loction between Phy and spikeinterface. You guys have never seen this? I need to make sure I am not doing anything wrong. So let me know what y need from my end- more codes or even my dataset. Also, my usual protocol is to run raw data through CAT-GT and then use the CatGT output in spikeinterface. Then I only run the CMR preprocessing (even though CatGT runs a similar mean based approach) on that data and no bandpass filter since catGT already does that. I believe any of these preprocessing steps that I run in spikeinterface should not change the locations of the unit because they are done all before kilosorting. Let me know whatever you need from my end to diagnose the issue.
I guess we haven't really confirmed the phy units vs the si units. could you do se.read_phy to make your sorting object and then do the sorting_analyzer reading phy and see what that looks like?
So
sorting = se.read_phy(r'path/to/phy/folder')
recording = se.load_extractor(r'path/to/rec')
analyzer= si.create_sorting_analyzer(sorting, recording, xx)
analyzer.compute(xx)
Hi @zm711 I did what you suggested. below is the code:
sorting = se.read_phy(r'G:\Cat_GT_Out\catgt_NPX2_2_28_24_2Rght_LHA_RSP_EXP_g0\phy_KS25')
recording_saved = si.load_extractor(base_folder / "preprocessed")
analyzer= si.create_sorting_analyzer(
sorting=sorting,
recording=recording_saved,
format='binary_folder',
folder= 'sorting_analyzer_2'
)
analyzer.compute(
"random_spikes",
method="uniform",
max_spikes_per_unit=500,
)
analyzer.compute("waveforms", ms_before=1.0, ms_after=2.0, return_scaled=True, **job_kwargs)
analyzer.compute("templates", operators=["average", "median", "std"])
analyzer.compute('spike_amplitudes', **job_kwargs)
analyzer.compute('unit_locations', method="monopolar_triangulation")
analyzer.compute('spike_locations', method="center_of_mass", **job_kwargs)
analyzer.compute('correlograms')
analyzer.compute('template_similarity')
print(analyzer)
Unfortunately, the unit location mismatch still exists. Even after creating a sorting object from the Phy folder I was expecting the unit locations in spikeinterface to match exactly with unit locations in Phy. Not sure where the error is originating from - Phy or Spikeinterface. But, whatever the source this is a serious problem in my opinion considering how different conclusions can be made in any given dataset and what the user wants to get out of the data. What do you guys think? Should I raise this issue with Phy also?
And sorry for responding late. I am caring for a newborn and its been hard to get back quick even though I am very keen to resolve this asap.
Is it because of the additional 'monopoar tiangulation' method being used in spikeinterface to compute unit locations? Can spikeinterface just copy what Kilosort found? My impression was that's what spikeinterface was doing.
I think it would be worth trying center_of_mass (which I believe is what phy is approximating).
As far as Phy that is in maintenance mode. I try to hop over there sometimes to help people with install, but the developer isn't responding to issues any more. So with phy what you see is what you get. Probably better just to troubleshoot here.
Ok. I will try center_of_mass and update if the mismatch goes away. Should be able to let you know by afternoon.
finally got around it. I ran 'center_of_mass' for spike and unit locations and its still showing the same locations in the probe in spikeinterface as it was with 'monopolar_triangulation'. So spikeinterface holds the same site with both methods and its different from Phy. The thing that is worrying for me is the degree of deviation between the two. Locations are switched more than 4000 microns along the shank axis but not a few microns. And, almost all discrepancy of unit locations between Phy and SI is in this scale. This seems like a huge deviation and a big issue for me. it would be great if the team can check if they are observing this in their dataset as well. If this is found consistently then its a problem. I am not sure why SI cannot just use the spike and unit locations what kilosort found but has to run its own computation on top of it. I would think that Phy is just copying Kilosort. In case if its not doing that then why run through all that computations in kilosort. Please let me know what you all think. You guys certainly know about this more than me but I would think my logic here makes sense and we should not be seeing this difference between the wtwo.