KiloSort icon indicating copy to clipboard operation
KiloSort copied to clipboard

Reducing the mergining propensity in merge_posthoc2?

Open PhantomSpike opened this issue 6 years ago • 5 comments

Hi,

Is it possible to somehow control the merging sensitivity of merge_posthoc2 e.g. set a threshold somewhere? Currently I end up with a few clusters which tend to suck in way too many spikes because of overmerging of templates (>100,000 spikes).

Can I change some parameter in the merge_posthoc2 function so as to reduce the likelihood of merging to see if that helps.

Also, a more general question for @marius10p

What would be the best way to reduce the number of clusters without sacrificing too many single units? I am mostly recording from primary/secondary auditory cortex which is quite sparsely active so the default recommendation for Nfilt ("number of clusters to use (2-4 times more than Nchan, should be a multiple of 32")) just gives me way too many clusters. I think there are 3 sensible options:

  1. Have many clusters (2-4 x NChanTotal) and then use merge_posthoc2

The problem here as I describe above is that this function just merges way too much

  1. Reduce the total number of templates via reducing Nfilt

My worry here is that although I save time by not having to do extensive merging and manual curation I am loosing many potential single units

  1. Have many clusters (2-4 x NChanTotal), increase ops.mergeT and reduce ops.splitT

I think these two parameters determine how much merging/splitting happens such that increasing ops.mergeT/reducing ops.splitT will decrease then number of clusters I get at the end. Not sure what are sensible values though.

Out of these 3 approaches, which one would make the most sense?

Sorry for the long post and thanks a lot for developing and maintaining this great software. It has been really great using it so far. :)

PhantomSpike avatar Aug 29 '18 10:08 PhantomSpike

Great questions. I’m interested in this too. On Wed, Aug 29, 2018 at 6:51 AM PhantomSpike [email protected] wrote:

Hi,

Is it possible to somehow control the merging sensitivity of merge_posthoc2 e.g. set a threshold somewhere? Currently I end up with a few clusters which tend to suck in way too many spikes because of overmerging of templates (>100,000 spikes).

Can I change some parameter in the merge_posthoc2 function so as to reduce the likelihood of merging to see if that helps.

Also, a more general question for @marius10p https://github.com/marius10p

What would be the best way to reduce the number of clusters without sacrificing too many single units? I am mostly recording from primary/secondary auditory cortex which is quite sparsely active so the default recommendation for Nfilt ("number of clusters to use (2-4 times more than Nchan, should be a multiple of 32")) just gives me way too many clusters. I think there are 3 sensible options:

  1. Have many clusters (2-4 x NChanTotal) and then use merge_posthoc2

The problem here as I describe above is that this function just merges way too much

  1. Reduce the total number of templates via reducing Nfilt

My worry here is that although I save time by not having to do extensive merging and manual curation I am loosing many potential single units

  1. Have many clusters (2-4 x NChanTotal), increase ops.mergeT and reduce ops.splitT

I think these two parameters determine how much merging/splitting happens such that increasing ops.mergeT/reducing ops.splitT will decrease then number of clusters I get at the end. Not sure what are sensible values though.

Out of these 3 approaches, which one would make the most sense?

Sorry for the long post and thanks a lot for developing and maintaining this great software. It has been really great using it so far. :)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/cortex-lab/KiloSort/issues/159, or mute the thread https://github.com/notifications/unsubscribe-auth/ADXrTeTlaM2m4cCchfWaUSZA3o2Lq7JFks5uVnJIgaJpZM4WRS92 .

brendonw1 avatar Aug 29 '18 10:08 brendonw1

Hi,

@marius10p I am sure you are quite busy but any advice on this would be much appreciated. :)

Is this optimization something purely empirical that has to be done for your own data or are there some priors that one can apply in choosing which of the three approaches I've mentioned to take?

Thanks!

PhantomSpike avatar Oct 03 '18 15:10 PhantomSpike

Sorry for the slow reply. There is no right answer and it depends on the data. Try them and see what works best on one of your recordings. It's likely that will generalize on other datasets from your setup.

Try 2 first, but don't go below 2x NChanTotal. Try 1, and tweak ops.fracse. Default is 0.1 I think, smaller will result in less merging, but it's not very thoroughly tested. Finally, you can try 3) but it's likely you'll introduce other problems by changing those too much. The splitting threshold is just checking if the amplitude histogram is bimodal. The merging threshold checks how correlated and similar amplitude two clusters are. The posthoc merges should be more refined in that they consider the entire distribution of points for two candidate merge clusters.

Finally, if neither works, it's likely your data just has a lot of motion, either slow or fast timescale. Wait a few months for us to release Kilosort2 which deals with that explicitly. Should work much better than either of these three approaches.

marius10p avatar Oct 03 '18 18:10 marius10p

Hi @marius10p

Thank you for the detailed answer. I will focus my efforts on 1) and 2) then.

For 2), is there a fundamental reason why going below 2xNChanTotal is bad? If you have n number of channels and say <1 neuron per channel, then why is it still necessary to have so many templates? I think I tried with Nfilt = 1.5xNChanTotal and I was getting decent results. Is this really not a good idea?

When is Kilosort2 scheduled for release?

Thank you!

PhantomSpike avatar Oct 03 '18 19:10 PhantomSpike

Kilosort 2 will release when we finish testing and ironing out a few bugs.

I should have said it's not a good idea to go below 2x true number of neurons in recording. That "true" number is usually similar to the number of channels.

marius10p avatar Oct 04 '18 12:10 marius10p