spikeinterface icon indicating copy to clipboard operation
spikeinterface copied to clipboard

Remove separate default job_kwarg `n_jobs` for sorters

Open zm711 opened this issue 1 year ago • 13 comments

Fixes #2696

Right now we aren't respecting the global n_jobs that user sets so the only way to override use all cores is to specifically pass n_jobs to the sorter. This is unintuitive if we allow the user to set this globally except in the case of sorters where it has to be set locally.

The current global default appears to be n_jobs=1 so we discussed whether that should be changed for people who don't know to set n_jobs so let me know if we want to do that in this PR too.

zm711 avatar Apr 12 '24 12:04 zm711

The current global default appears to be n_jobs=1 so we discussed whether that should be changed for people who don't know to set n_jobs so let me know if we want to do that in this PR too.

I vote for n_jobs=0.8 :)

@samuelgarcia @h-mayorquin @chrishalcrow what do you guys think?

alejoe91 avatar Apr 12 '24 13:04 alejoe91

I guess the one concern with changing to a default of multiprocessing is that we may we see more issues whereas with opt-in not everyone is using those. But if we decide to do a default then I think 0.7 or 0.8 seems safe enough!

zm711 avatar Apr 12 '24 16:04 zm711

I think that 0.8 cores sounds good as a default. My intuition is that we should leave one physical core free as most people multitask x D.

h-mayorquin avatar Apr 12 '24 16:04 h-mayorquin

@samuelgarcia now that you're back opinions on n_jobs default? I think we hard code a default for now as we wait for a jobs_kwargs optimizer in the future.

zm711 avatar Apr 29 '24 12:04 zm711

If this is for the sorter 0.8 sounds OK (because it is only to write this binary).

But we need to have in mind that on some computing nodes you can have many many cores that would lead to an enormous ram demand.

So for the main n_jobs I prefer to keep 1 so it force the user to think about it to improve speed.

samuelgarcia avatar Apr 29 '24 14:04 samuelgarcia

I don't like changing the n_jobs without the user knowing. So if the desired global n_jobs would be n_jobs=1 then I would prefer just to feed that in to sorters as well and the end user can decide to switch to multiprocessing for sorters. But having n_jobs >1 seems like it might take more debate eh?

zm711 avatar Apr 29 '24 15:04 zm711

Based on #2826 and #2820. I think even for sorters we should default to n_jobs=1 because some computers aren't working with multiprocessing at all. Currently not all sorters accept n_jobs as a parameter so maybe we add that to base sorting instead?

zm711 avatar May 10 '24 11:05 zm711

I agree. Maybe a warning at the fix_job_kwargs level that "warns" that using n_jobs=1 is not optimal?

alejoe91 avatar May 10 '24 11:05 alejoe91

I am slightly against the warning but I can't think on a better way to let them know : /

Two suggestions if we go with the warning:

  • It should not be in a very general place as some sorters don't accept that and then it becomes and irrelevant warning.
  • There should be a way to deactivate it that does not require you to do multiprocessing. As in, yes, I am aware but no, I don't want to use multiprocesing option.

h-mayorquin avatar May 10 '24 14:05 h-mayorquin

@samuelgarcia suggested a global flag global_kwargs_set that by session is set to True if the user calls the set_global_job_kwargs. I think it's a good idea :)

alejoe91 avatar May 10 '24 14:05 alejoe91

@zm711 I made the suggested changes.

Now the fix_job_kwargs warns if n_jobs=1 and it hasn't been set (either globally, or locally). Also extended a bit the job_kwargs documentation.

alejoe91 avatar May 14 '24 11:05 alejoe91

Thanks Alessio!

zm711 avatar May 14 '24 12:05 zm711

I am OK with the logic. I made a proposal.

samuelgarcia avatar May 14 '24 19:05 samuelgarcia