spikeinterface icon indicating copy to clipboard operation
spikeinterface copied to clipboard

Avoid random and computation in preprocessing init class

Open samuelgarcia opened this issue 3 years ago • 1 comments

Some classes for preprocessing like WhitenRecording, NormalizeByQuantileRecording,ZScoreRecordingare using internallyget_random_data_chunks()`.

This make the end user experience easier but this is rather bad for:

  • reproducibility
  • parralel processing

For parralel processing particularly it is really really bad because:

  • every worker make a differents random and so the noise or covariance matrix is not the same across workers. And so the n_jobs>1 make different results for every run.
  • the startup of every worker can be super long when n_jobs is very high because each worker is fighting for CPU ressource for all inversing a covaraince matrix for instance.

We have a way to store _kwargs but we should have another way (other dict) that would enable to restore the class very quickly in the same state without any random in between.

samuelgarcia avatar Nov 29 '22 14:11 samuelgarcia

How close are we to this? The issue is two years old and I doubt we really look at it. It is an ongoing effort, but I think we've made huge progress no?

zm711 avatar Sep 20 '24 11:09 zm711

@alejoe91 I'll bring you into this :) What do you think? Close this as super stale for now? I don't know if keep this as a reminder is serving a purpose at this point.

zm711 avatar Nov 22 '24 19:11 zm711

This is fixed :) the random stuff is computed in the main process and added as kwargs

alejoe91 avatar Nov 22 '24 19:11 alejoe91