ashr
ashr copied to clipboard
general ash
@mengyin can you take a look at the code I've got in my general branch? I think it incorporates much, but not yet all, of what you had in your branch, but without all the repetition. It seems to run OK. There are still some things to do, but thought it would be useful for you to look and see if, for example, the results for logF errors look similar to yours. (note that for now I think df1 and df2 have to be the same for all observations...; changing this should not be too hard, but not done yet)
To see how to run it, see my test in tests/testthat/test_logF.R set.seed(10) e = rnorm(100)+log(rf(100,df1=10,df2=10)) e.ash = ash(e,1,lik=logF_lik(df1=10,df2=10)) s = sd(log(rf(10000,10,10))) # get approximate standard deviation by simulation e.ash.norm = ash(e,s) expect_equal(e.ash.norm$PosteriorMean,e.ash$PosteriorMean,tol=0.05)
and also the examples under ?ash.workhorse
Yes the general branch's results look very similar to my old results! Anyway in my current simulations df1 and df2 are always the same for all observations, so it is definitely fine...
that's good. A few things to resolve:
- you wrote a replacement to my "gradient" function that seems to be a fix to a divide by zero problem. I guess you got some errors that caused this problem, and we could just replace my function with yours?
- still need to deal with the autoselect issue, and work out how to let the user specify a way to select a grid that depends on the likelihood. Maybe the autoselect method should be specified when specifying likelihood (or a default supplied)
- may want to allow the user to just specify pdf of likelihood, and compute cdf by integrate? This might be a future feature.
- I have an implementation that allows each observation to have its own likelihood about to be pushed to the general-multilik branch, but I'm not 100% happy with the way I did it... still thinking about it.
- Yes I got errors sometimes with the old "gradient" function, replacing it by my new gradient function fixed the problem divide by zero and didn't affect results in general.
- For the autoselect function, my first thought was choosing the grid based on the shape of error distribution...but computing the distribution's moments itself can be non-trivial (numerical integration is unstable). Anyway I guess it's no big deal as long as the chosen grid's range is wide enough.
- Yes it'd be great if the user just need specify pdf (or just cdf and compute pdf by numerical differentiation? might be unstable though)!
for autoselect, maybe we could use the cdf instead of the moments? I only used moments in the normal case as a shorthand for making sure the grid spanned "plausible" values.
I see..say, use the values where the cdf is almost 1 (or 0) to decide the "plausible" range?