Standardising parts of the API.
Following a conversion in #2299, a number of instances where the API is inconsistent were highlighted. This issue is opened as a place to discuss and begin fixing these:
ZeroChannelPaddedRecordingsaves it's parent recording to aparent_recordingkwarg not arecordingkwarg that seems to be used elsewhere.ZeroSilencePeriodsstores the parent recording as a dict not a recording.CurationSortinguses aparent_sortingrather thansortingfoldervs.output_folderargument.
More genreally, I think SI is becomming a very mature and widely used package and freezing the API / defaults as much as possible between versions would really enhance user experience / avoid bugs. I wonder if at some stage a full review of the API could be performed (this would entail multiple people going through every SI function / class and making notes on function names and default arguments inconsistencies or room for improvement). Then these could be discussed, frozen, and only changed in future with a very good reason. I wonder what people think about this.
Thanks for this write up, @JoeZiminski (and @zm711 ).
I agree that we should unify the API wherever we can. For folder VS output_folder, let's keep it in mind when starting with #2282 (I guess that is mainly for the WaveformExtractor, right?)
run_sorter and run_sorter_by_property use different folder arguments.
So what should be for preprocessing? parent_recording or justrecording?
Most function/classes in the spikeinterface have a positional argument labeled as recording (so I switched the docs to use keyword to encourage people to use the keyword instead), so the path of least resistance (and most consistent for people) would be to use recording and sorting for all functions which take those values in. Rather than having some rare uses of parent_recording/sorting.
I actually do find some value on the fact that parent_recording differentiates from the (current)_recording which is usually self. On the other hand we also use the same term for the Segments so it is kind of overloaded:
https://github.com/catalystneuro/spikeinterface/blob/6fabd004ddbe6a03df5897de4cb2a8860487de36/src/spikeinterface/core/base.py#L1172-L1182
I would rather use input_recording in preprocessing as a general thing but maybe only using recording is fine. Just wanted to point out some positive aspect for disambiguation.
I would be fine if every preprocessor took in parent_recording (or input_recording), but since so much code could be using the keyword recording at this point it seems more disruptive to use a different keyword. But from a mental schema I think using something other than recording could actually be beneficial.
I agree parent_recording or input_recording is a better name as recording on a Recording that is not self is confusing. It is worth collecting examples of such potential renaming improvements, and these can be changed together at an opportune moment sometime later. I think it's okay to make a change in a suboptimal direction for consistency now with an eye to making a more disruptive change to a better name in future.
duration vs. durations in the test data generation suite
Hey @alejoe91 @zm711 @h-mayorquin I thought I'd resurrect this thread considering (as I understand it) the current work on #2282 is being wrapped up and docs being written. I wonder if it is even worth spending an afternoon (or possible hackathon for the meet-up) going through the entire API and categorising function names / arguments into:
- happy with
- unhappy with but will live with and won't change in future (to avoid changing too much)
- unhappy with and will change now
However this is a lot of work so appreciate it might not be a priority at the moment or might be better later as a standalone endeavour-if SortingAnalyzer changes are already complicated adding a second load of API renaming on top might be too much!
I like that @JoeZiminski ! I can think a little API polish might be nice :)
Hackathon project?