spikeinterface
spikeinterface copied to clipboard
SortingAnalyzer.get_extension() should fail when the extension does not exist
This will be more consistent with the principle that thing should fail early. I think that returning None the way that we are doing currently:
https://github.com/SpikeInterface/spikeinterface/blob/024c24afaec2d0a0476017c77a75f7c8a1dcb4b3/src/spikeinterface/core/sortinganalyzer.py#L1084-L1101
Can introduce hard to debug situation. Better to fail when you don't get the right name.
But maybe there is a good reason for this that I am not aware. I decided to open an issue just in case someone (like me) gets confused by this (I just got bug) and googles it.
I think the problem is that it is also used internally as a check for computing things. I guess you could split this into a public and a private function or refactor the code not to just check for None.
https://github.com/SpikeInterface/spikeinterface/blob/024c24afaec2d0a0476017c77a75f7c8a1dcb4b3/src/spikeinterface/core/sortinganalyzer.py#L940-L942
Yeah, that internal code should be refactored I think. Public API should not be constrained to internal convenience.
Not sure to be aligned with you on this. get_extension) returning None is really really usefull at many places.
We also have the has_extension function. Probably better to use it instead of if analyzer.get_extension("...") is not None
My only strong preference is that the one that is available to users behaves like a normal dictionary without magic. Implicit behavior is always confusing. Right now we have three things:
- analyzer.get_extension() that has default dict magic
- analyzer.has_extension() that has undocumented behavior for memory mode
- analyzer.extensions that I think behaves as a normal dict.
Can we divide the ones that are for internal use, make them private, and make one the privileged public API that behaves both:
- consistently (the same for every format)
- familarly (the way that users expect dicts to work).
This will reduce the mental burden for users instead of having to remember three similar looking functions with subtle differences.
What I had in mind was: the analyzer can be remote (using zarr) and maybe we do not want to load all extensions because it has a cost on the bandwith.
So load extensions are loaded on demand only.
For me analyzer.get_extension() was clearly the semantic :
- give an extension if you have it
- load it if not already loaded but can be loaded
- otherwise return None.
maybe the semantic is bad but I think really need this 3 steps in once because this is very convinient for external tools (like sigui) and also internally at many places.
The extension dict is what is already loaded in memory.
analyzer.has_extension() is what can be loaded to the dict and this make non sens for memory of course.
Thanks for the explanation Sam.
Our of curiosity, the return None property, is just for syntactic suggar, right? The convenience that you point above is the one that @zm711 pointed above:
ok = any(self.get_extension(name) is not None for name in dependency_name.split("|"))
else:
ok = self.get_extension(dependency_name) is not None
Are we good with closing this one then? Or do we want to have more debate?
Mmm not sure how to move forward. I don't like @samuelgarcia magic and I think that the current state of affairs is very confusing for end users at the cost of making things easier for the programmer.
I propose the following approach to move forward in order of easiness to agree over:
- Document the current behavior.
- Find a better semantic, I think that
get_extensionis bad for such an overloaded method. - Make the method private.
Fair enough! I was just checking issues that had gone cold. Since this is still active we just keep it open :)