spikeinterface
spikeinterface copied to clipboard
Avoid random and computation in preprocessing init class
Some classes for preprocessing like WhitenRecording, NormalizeByQuantileRecording,ZScoreRecordingare using internallyget_random_data_chunks()`.
This make the end user experience easier but this is rather bad for:
- reproducibility
- parralel processing
For parralel processing particularly it is really really bad because:
- every worker make a differents random and so the noise or covariance matrix is not the same across workers. And so the n_jobs>1 make different results for every run.
- the startup of every worker can be super long when n_jobs is very high because each worker is fighting for CPU ressource for all inversing a covaraince matrix for instance.
We have a way to store _kwargs but we should have another way (other dict) that would enable to restore the class very quickly in the same state without any random in between.
How close are we to this? The issue is two years old and I doubt we really look at it. It is an ongoing effort, but I think we've made huge progress no?
@alejoe91 I'll bring you into this :) What do you think? Close this as super stale for now? I don't know if keep this as a reminder is serving a purpose at this point.
This is fixed :) the random stuff is computed in the main process and added as kwargs