resample
resample copied to clipboard
Add option `threads`?
One of the first questions I got after the presentation on resample at PyHEP was about parallelization.
In principle, resampling methods are perfectly parallelizable, assuming that fn
is pure (has no side-effects). That is generally a reasonable assumption. In Python, there are many ways to parallelize, you may want to parallelize on your own cores, or on some cluster of computers, or on the cloud. Therefore, offering direct access to resample
is good, because it allows the user to user to chose their parallelization scheme.
For the simple common cases, however, we may want to offer a threads
option to our methods, which compute fn
on the replicas using threads
number of threads on the current computer, to better utilize common multi-core processors. This would an option for the functions bootstrap
and jackknife
and those that build on them, e.g. bias
and variance
etc. @dsaxton What do you think?
I think it makes sense, although I wouldn't know how to implement it. Are there options built into numpy and scipy that can be used?
Parallel execution is easy to implement with concurrent.futures.ThreadPoolExecutor
, I can do that. It is mainly a question of whether we want to add this. I think it would be useful and convenient, but you were worried a while ago about adding too many keywords, that's why I bring it up before coding something.
Parallel execution is easy to implement with
concurrent.futures.ThreadPoolExecutor
, I can do that. It is mainly a question of whether we want to add this. I think it would be useful and convenient, but you were worried a while ago about adding too many keywords, that's why I bring it up before coding something.
I'd be in favor of adding it. I think it could go after 1.0.0 since it should be fully backwards compatible?
True!