spikeinterface
spikeinterface copied to clipboard
Question(s) about the model-based postprocess curation
Hi,
I looked at the existing available models mentioned in the docs, and they were all for Neuropixels, in vivo on mouse. I assume that these models are not directly transferable to Maxwell data on human cell cultures, right?
So I want to train our own model. And since manually labeling thousands of templates is very tedious, I don't want to start on the wrong foot and be forced to redo it.
What kind of labeling is supported? The docs show only "good"/"bad" binary labeling, which I suppose is the most clear. But would it be possible to use more labels, like "good"/"distant"/"multiple"/"truncated"/"noise"... ? There are some "grey area" categories of templates that we might want to be able to include or exclude depending on the kind of questions we are asking.
How many templates per category would be a good minimum? I'm working with Maxwell data, so there can be up to 2k templates in an analyzer and I'd like to include at least three analyzers to cover some replicate subjects and also different recording/sparsity parameters, so potentially up to 6k templates that would need to be labelled. Subsetting them could be really helpful for time and mental sanity.
What other considerations are there that I may not have thought about?
Thank you for your time!
Hi @fruce-ki
You can try out the models and see what they give you, but I would guess that they're not very good on human cell cultures, no.
You can use any labelling you'd like. When you're manually curating, I would use lots of labels. Then you might combine some of these in later analysis. For instance, I use a lot of labels which described why I was discarding units (like "correlogram", "wide waveform" etc). Then filter these just before training into smaller groups. Although you can train on as many labels as you'd like, too.
In Figure 3 of the preprint (https://www.biorxiv.org/content/10.1101/2025.03.30.645770v1.full.pdf) there's a plot of model accuracy vs amount of labelled data. Performance starts flattening out after 500 units or so. So I'd try out labelled 500, train a model and see what it spits out for some test data. You don't need to label every single unit of an analyzer, though you might need to subset the units you've labelled (using analyzer.select_units) before training the model.
It might be a good idea to get a workflow up and running to help check your model training. For this just label 20 units, train a little model and try to visualise the model output (maybe make plots of the units the models labels in different ways).
An important thing is to check which metrics you're computing. We think that the template_metrics are pretty important for this, but by default the "multi_channel_metrics" are not computed. You can include them in your pipeline by doing something like
analyzer.compute("template_metrics", include_multi_channel_metrics=True)
or
compute_template_metrics(analyzer, include_multi_channel_metrics=True)
Please ask any more questions/clarifications - it's a new feature so feedback would be great!
@chrishalcrow Just out of curiosity, have you heard of people using 64-channel probes for training a curation model? I would love to give it a try but I am not sure if the data from my recording paradigm (64-channels recording in walking insects) is consistent enough for building. Nevertheless, if I manage train a model for all possible noise sources (from motion artificate, monitors, IR LED)in my dataset, that will already save my life.
@chrishalcrow Thanks for the tips! Detailed labelling and then training different simpler models is a good compromise, but I will for sure try to throw everything at once at the model too.
I am already always computing all the computable metrics (template + multichanel, unit + pc).
@chrishalcrow Just out of curiosity, have you heard of people using 64-channel probes for training a curation model? I would love to give it a try but I am not sure if the data from my recording paradigm (64-channels recording in walking insects) is consistent enough for building. Nevertheless, if I manage train a model for all possible noise sources (from motion artificate, monitors, IR LED)in my dataset, that will already save my life.
@chiyu1203 I would give it a try running the pre-trained models. These are trained based on quality and template metrics, which should be relatively agnostic to probe geometry. If the results are not satisfactory, then you can try to perform labeling and train your own models!
@chiyu1203 Hello, not heard of anyone trying that yet - also, the pretrained models all use mice so that's another reason it might not work super well, but definitely give them a go!
@fruce-ki that's good to hear. We also like this approach because we generate a conservative model and a liberal model from the same labeled data.
Let us know how it goes!
I will close this as we have addressed the question, but if anything else comes up feel free to re-open or add new issues in the repo!