resample
resample copied to clipboard
Support for computations from pre-calculated replicates
As mentioned in #34, we need a way to allow computation from pre-calculated replicates.
My original idea was to reuse the existing interface to do this, but documentation and argument types then become a bit ugly, which was also @dsaxton 's concern.
We need to resolve this before publishing release 1.0, in case it has implications for our interface overhaul (not necessarily the case, but it could have).
To address the points raised, I drafted a solution here where the interface of the bias function in the jackknife module is left as is. In addition, there is a bias_from_precalculated (name to be refined), which accepts the pre-calculated replicates. Internally, bias calls bias_from_precalculated, of course.
@dsaxton Would that be a way to go for all functions? We need to introduce X_from_precalculated then for
- bias
- bias_corrected
- variance
- confidence_interval
in both the jackknife and bootstrap modules.
This would be an acceptable solution for me, but whenever I see a common prefix/suffix, I am thinking of namespaces. I think it would be more organized to put these in a separate module, so that one can do
from resample.jackknife.precalculated import bias # the version in which you pass `theta` and `resampled_thetas`
or
from resample.jackknife import bias # the version in which you pass `fn` and `sample`
I think to make this work, we need to make jackknife and bootstrap into sub-packages, which then can have a sub-module precalculated. The directory structure would look like this.
resample
__init__.py
jackknife
__init__.py
precalculated.py
bootstrap
__init__.py
precalculated.py
Still not sure how I feel about complicating the library to accommodate this, it feels a bit like over-optimization.
To me I wouldn't worry a whole lot if my computer has to do some extra work when calling resample.bootstrap.variance followed by resample.bootstrap.bias. If a user is especially concerned about reusing the replicates it's not that hard to write custom code using what we already have that persists the replicates and then does what he / she wants with them.
Still not sure how I feel about complicating the library to accommodate this, it feels a bit like over-optimization.
And for me it is difficult to understand why you don't see how essential this is.
Imagine that evaluating your estimator fn cost you 1 min. If fn is a complex maximum-likelihood fit, then this is not unrealistic. It could be more than 1 min, it could be 10, 100 min. If computing fn is costly like that, but you want to know its bias and variance, you certaintly don't want to compute 100 replicates twice.
If you think that using the formulas directly is trivial, then why write a python library in the first place. The formulas, while simple, are not trivial. The bias and variance are computed differently for the jackknife and bootstrap. We write libraries like this so that people don't have to worry about details like that.
I would appreciate it if we start to discuss the "how" and not the "whether at all" of this, since I am 100 % convinced of the latter and I won't change my mind.
Add functions if you want, submodules seem like overkill
Sorry, for the delayed answer. Making further submodules is easy, that shouldn't stop us. It would move this functionality out of sight of those who don't care about this. Isn't this in your interest? I do agree with you that this functionality is an advanced feature for power users, so it is fair to put it a bit deeper into the package.
precalculated could also be named formulas or something.
precalculatedcould also be namedformulasor something.
cached could also make for an interesting name (since we are using "cached" replicates). I suppose submodules aren't too bad if they're easy to add.
@HDembinski Were you still interesting in working on this? I think it would be nice to release a new version in the not too distant future (your plan was to include this I believe).
@dsaxton I am still working on this, but if we agree on the plan laid out here, it can be done after the 1.0 release. Feel free to go ahead with the release, but please leave this open for me to work on, ok?