spikeinterface icon indicating copy to clipboard operation
spikeinterface copied to clipboard

Only load `template_metrics` extension on compute if keeping some metrics

Open chrishalcrow opened this issue 1 year ago • 3 comments

Might fix #3471 (Could you try this out please @jonpedros)

When template_metrics is computed and delete_existing_metrics = False, any old metrics which aren't being recomputed are kept. To do this, the _run method loads the old template metric extension (if it exists). However, if this is the first time it has run, it has already created the extension folder. In this case _run sees the newly-created folder and tries to load it. Bu there's not much in it, so a no run_info warning appears.

To avoid this, this PR only loads the extension folder if there are metrics to be kept.

I think the previous implementation of load_run_info was sending two warnings if a run_info file didn't exist. I think I've made the logic a bit simpler. Could one of the run_info experts take a look please (@jonahpearl @alejoe91 ) - thanks!

chrishalcrow avatar Oct 15 '24 14:10 chrishalcrow

The run_info change looks fine. Incidentally, this is exactly the reason that I think it would be useful to save run_info before the run has occurred, and perhaps as soon as the extension is created (@alejoe91 and see #3451) — because then it's very simple to check if the extension has already been run. If it hasn't been, you can skip all the delete_existing_metrics logic, avoid loading an empty folder, etc.

As written here, I think it's this condition if set(self.params["metrics_to_compute"]) != set(self.params["metric_names"]) that's being used to check if the newly created folder is empty? I suspect it would be clearer to just check if not run_info["run_completed"] (if we can make run_info get saved before the run happens) but lmk if I'm misunderstanding.

jonahpearl avatar Oct 15 '24 15:10 jonahpearl

Making the run_info.json file asap sounds reasonable to me.

As written here, I think it's this condition if set(self.params["metrics_to_compute"]) != set(self.params["metric_names"]) that's being used to check if the newly created folder is empty? I suspect it would be clearer to just check if not run_info["run_completed"] (if we can make run_info get saved before the run happens) but lmk if I'm misunderstanding.

To be honest, it's quite a good check to do anyway: it will now only try to load the template_metrics extension it needs propagate some already-existing metrics from an old run. So in this case I'm happy with the solution.

chrishalcrow avatar Oct 17 '24 10:10 chrishalcrow

Making the run_info.json file asap sounds reasonable to me.

As written here, I think it's this condition if set(self.params["metrics_to_compute"]) != set(self.params["metric_names"]) that's being used to check if the newly created folder is empty? I suspect it would be clearer to just check if not run_info["run_completed"] (if we can make run_info get saved before the run happens) but lmk if I'm misunderstanding.

To be honest, it's quite a good check to do anyway: it will now only try to load the template_metrics extension it needs propagate some already-existing metrics from an old run. So in this case I'm happy with the solution.

I think I removed it because it didn't play nice with loading old waveform extractor folders as an analyzer. This was part as a larger fix to improve back-compatibility, so I'm happy to test if this PR works on old data on my end!

alejoe91 avatar Oct 17 '24 10:10 alejoe91

Tested it and now all the extensions I need to compute are outputted as expected. I do still get the warnings, though, but I guess that was not changed?

For template_metrics:

[c:\Users\systemses\Anaconda3\envs\si_env\Lib\site-packages\spikeinterface\core\sortinganalyzer.py:2043](file:///C:/Users/systemses/Anaconda3/envs/si_env/Lib/site-packages/spikeinterface/core/sortinganalyzer.py:2043): UserWarning: Found no run_info file for template_metrics, extension should be re-computed.
  warnings.warn(f"Found no run_info file for {self.extension_name}, extension should be re-computed.")
[c:\Users\systemses\Anaconda3\envs\si_env\Lib\site-packages\spikeinterface\core\sortinganalyzer.py:2050](file:///C:/Users/systemses/Anaconda3/envs/si_env/Lib/site-packages/spikeinterface/core/sortinganalyzer.py:2050): UserWarning: Found no run_info file for template_metrics, extension should be re-computed.
  warnings.warn(f"Found no run_info file for {self.extension_name}, extension should be re-computed.")
[c:\Users\systemses\Anaconda3\envs\si_env\Lib\site-packages\spikeinterface\core\sortinganalyzer.py:2125](file:///C:/Users/systemses/Anaconda3/envs/si_env/Lib/site-packages/spikeinterface/core/sortinganalyzer.py:2125): UserWarning: Found no data for template_metrics, extension should be re-computed.
  warnings.warn(f"Found no data for {self.extension_name}, extension should be re-computed.")

For spike_amplitudes:

[c:\Users\systemses\Anaconda3\envs\si_env\Lib\site-packages\numpy\core\_methods.py:206](file:///C:/Users/systemses/Anaconda3/envs/si_env/Lib/site-packages/numpy/core/_methods.py:206): RuntimeWarning: Degrees of freedom <= 0 for slice
  ret = _var(a, axis=axis, dtype=dtype, out=out, ddof=ddof,
[c:\Users\systemses\Anaconda3\envs\si_env\Lib\site-packages\numpy\core\_methods.py:163](file:///C:/Users/systemses/Anaconda3/envs/si_env/Lib/site-packages/numpy/core/_methods.py:163): RuntimeWarning: invalid value encountered in divide
  arrmean = um.true_divide(arrmean, div, out=arrmean,
[c:\Users\systemses\Anaconda3\envs\si_env\Lib\site-packages\numpy\core\_methods.py:198](file:///C:/Users/systemses/Anaconda3/envs/si_env/Lib/site-packages/numpy/core/_methods.py:198): RuntimeWarning: invalid value encountered in divide
  ret = ret.dtype.type(ret / rcount)

jonpedros avatar Oct 25 '24 08:10 jonpedros

Tested it and now all the extensions I need to compute are outputted as expected. I do still get the warnings, though, but I guess that was not changed?

Hey ~@jonahpearl~ @jonpedros- sorry, I completely forgot about this PR. The warnings were updated. I think that, based on the fact that your warnings about template_metrics happen at lines 2043 and 2050, you're using the "main" branch rather than this PRs branch. If you have the github CLI, you can check out this branch by running gh pr checkout 3478 in your spikeinterface folder. If you had time to do that, re-run and confirm that you don't get two identical "Found no run_info file for template_metrics, extension should be re-computed" warning; that'd be great.

chrishalcrow avatar Nov 13 '24 15:11 chrishalcrow

Hey @jonahpearl

@jonpedros I think you meant :)

jonahpearl avatar Nov 13 '24 15:11 jonahpearl

If you had time to do that, re-run and confirm that you don't get two identical "Found no run_info file for template_metrics, extension should be re-computed" warning; that'd be great.

Tested and it's gone! Only the one from spike_amplitudes remains. Thanks!

jonpedros avatar Nov 21 '24 09:11 jonpedros