resample
resample copied to clipboard
Add option `threads`?
One of the first questions I got after the presentation on resample at PyHEP was about parallelization.
In principle, resampling methods are perfectly parallelizable, assuming that fn is pure (has no side-effects). That is generally a reasonable assumption. In Python, there are many ways to parallelize, you may want to parallelize on your own cores, or on some cluster of computers, or on the cloud. Therefore, offering direct access to resample is good, because it allows the user to user to chose their parallelization scheme.
For the simple common cases, however, we may want to offer a threads option to our methods, which compute fn on the replicas using threads number of threads on the current computer, to better utilize common multi-core processors. This would an option for the functions bootstrap and jackknife and those that build on them, e.g. bias and variance etc. @dsaxton What do you think?
I think it makes sense, although I wouldn't know how to implement it. Are there options built into numpy and scipy that can be used?
Parallel execution is easy to implement with concurrent.futures.ThreadPoolExecutor, I can do that. It is mainly a question of whether we want to add this. I think it would be useful and convenient, but you were worried a while ago about adding too many keywords, that's why I bring it up before coding something.
Parallel execution is easy to implement with
concurrent.futures.ThreadPoolExecutor, I can do that. It is mainly a question of whether we want to add this. I think it would be useful and convenient, but you were worried a while ago about adding too many keywords, that's why I bring it up before coding something.
I'd be in favor of adding it. I think it could go after 1.0.0 since it should be fully backwards compatible?
True!