resample icon indicating copy to clipboard operation
resample copied to clipboard

Support for computations from pre-calculated replicates

Open HDembinski opened this issue 5 years ago • 10 comments

As mentioned in #34, we need a way to allow computation from pre-calculated replicates.

My original idea was to reuse the existing interface to do this, but documentation and argument types then become a bit ugly, which was also @dsaxton 's concern.

We need to resolve this before publishing release 1.0, in case it has implications for our interface overhaul (not necessarily the case, but it could have).

To address the points raised, I drafted a solution here where the interface of the bias function in the jackknife module is left as is. In addition, there is a bias_from_precalculated (name to be refined), which accepts the pre-calculated replicates. Internally, bias calls bias_from_precalculated, of course.

@dsaxton Would that be a way to go for all functions? We need to introduce X_from_precalculated then for

  • bias
  • bias_corrected
  • variance
  • confidence_interval

in both the jackknife and bootstrap modules.

This would be an acceptable solution for me, but whenever I see a common prefix/suffix, I am thinking of namespaces. I think it would be more organized to put these in a separate module, so that one can do

from resample.jackknife.precalculated import bias # the version in which you pass `theta` and `resampled_thetas`

or

from resample.jackknife import bias # the version in which you pass `fn` and `sample`

I think to make this work, we need to make jackknife and bootstrap into sub-packages, which then can have a sub-module precalculated. The directory structure would look like this.

resample
  __init__.py
  jackknife
     __init__.py
     precalculated.py
  bootstrap
     __init__.py
     precalculated.py

HDembinski avatar Jul 18 '20 14:07 HDembinski

Still not sure how I feel about complicating the library to accommodate this, it feels a bit like over-optimization.

To me I wouldn't worry a whole lot if my computer has to do some extra work when calling resample.bootstrap.variance followed by resample.bootstrap.bias. If a user is especially concerned about reusing the replicates it's not that hard to write custom code using what we already have that persists the replicates and then does what he / she wants with them.

dsaxton avatar Jul 18 '20 16:07 dsaxton

Still not sure how I feel about complicating the library to accommodate this, it feels a bit like over-optimization.

And for me it is difficult to understand why you don't see how essential this is.

Imagine that evaluating your estimator fn cost you 1 min. If fn is a complex maximum-likelihood fit, then this is not unrealistic. It could be more than 1 min, it could be 10, 100 min. If computing fn is costly like that, but you want to know its bias and variance, you certaintly don't want to compute 100 replicates twice.

HDembinski avatar Jul 18 '20 17:07 HDembinski

If you think that using the formulas directly is trivial, then why write a python library in the first place. The formulas, while simple, are not trivial. The bias and variance are computed differently for the jackknife and bootstrap. We write libraries like this so that people don't have to worry about details like that.

HDembinski avatar Jul 18 '20 17:07 HDembinski

I would appreciate it if we start to discuss the "how" and not the "whether at all" of this, since I am 100 % convinced of the latter and I won't change my mind.

HDembinski avatar Jul 18 '20 17:07 HDembinski

Add functions if you want, submodules seem like overkill

dsaxton avatar Jul 20 '20 14:07 dsaxton

Sorry, for the delayed answer. Making further submodules is easy, that shouldn't stop us. It would move this functionality out of sight of those who don't care about this. Isn't this in your interest? I do agree with you that this functionality is an advanced feature for power users, so it is fair to put it a bit deeper into the package.

HDembinski avatar Jul 30 '20 15:07 HDembinski

precalculated could also be named formulas or something.

HDembinski avatar Jul 30 '20 15:07 HDembinski

precalculated could also be named formulas or something.

cached could also make for an interesting name (since we are using "cached" replicates). I suppose submodules aren't too bad if they're easy to add.

dsaxton avatar Jul 30 '20 17:07 dsaxton

@HDembinski Were you still interesting in working on this? I think it would be nice to release a new version in the not too distant future (your plan was to include this I believe).

dsaxton avatar Aug 08 '20 14:08 dsaxton

@dsaxton I am still working on this, but if we agree on the plan laid out here, it can be done after the 1.0 release. Feel free to go ahead with the release, but please leave this open for me to work on, ok?

HDembinski avatar Aug 10 '20 13:08 HDembinski