ashr
ashr copied to clipboard
ash_pois log link problems
On simulated NB data (https://users.rcc.uchicago.edu/~aksarkar/pois-mode-est.Rds), fitting with log link gets a much worse log likelihood than with identity link.
> dat <- readRDS("/scratch/midway2/aksarkar/modes/test-data/pois-mode-est.Rds")
> with(dat, summary(MASS::glm.nb(x ~ offset(log(s))))$twologlik / 2)
[1] -2008.504
> fit0 <- ashr::ash_pois(dat$x, dat$s, link="identity", mixcompdist="halfuniform")
> fit0$loglik
[1] -2008.271
> lam <- dat$x / dat$s
> eps = 1 / mean(dat$s)
> log_lam <- log(lam + eps)
> se_log_lam <- sqrt(var(lam) / (lam + eps)^2)
> mixsd <- seq(.1 * min(se_log_lam), max(2 * sqrt(log_lam^2 - se_log_lam^2)), by=.5 * log(2))
> fit1 <- ashr::ash_pois(dat$x, dat$s, link="log", mixcompdist="halfuniform", mixsd=mixsd)
> fit1$loglik
[1] -3199.636
autoselect.mixsd needs to be modified to handle this case. For se_log_lam,
I am using Taylor expansion of V[ln(\lambda + \epsilon)]. This calculation also
needs to be verified.
On real data (https://users.rcc.uchicago.edu/~aksarkar/b-cell-data.Rds), the marginal PMF calculation assuming log link returns probabilities > 1, which should not happen.
> dat <- readRDS("/scratch/midway2/aksarkar/modes/test-data/b-cell-data.Rds")
> query <- dat[dat["gene"] == "ENSG00000019582",]
> with(query, summary(MASS::glm.nb(count ~ offset(log(size))))$twologlik / 2)
[1] -31414.6
> fit0 <- ashr::ash_pois(query$count, query$size, link="identity", mixcompdist="halfuniform")
> fit0$loglik
[1] -31395.05
> lam <- query$count / query$size
> eps = 1 / mean(query$size)
> log_lam <- log(lam + eps)
> se_log_lam <- sqrt(var(lam) / (lam + eps)^2)
> mixsd <- seq(.1 * min(se_log_lam), max(2 * sqrt(log_lam^2 - se_log_lam^2)), by=.5 * log(2))
> fit1 <- ashr::ash_pois(query$count, query$size, link="log", mixcompdist="halfuniform", mixsd=mixsd)
> fit1$loglik
[1] 10408.96
> any(fit1$fitted_g$pi %*% ashr:::comp_dens_conv(fit1$fitted_g, fit1$data) > 1)
[1] TRUE
any(ashr:::comp_dens_conv(fit1$fitted_g, fit1$data) > 1)
[1] TRUE
> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-conda_cos6-linux-gnu (64-bit)
Running under: Scientific Linux 7.4 (Nitrogen)
Matrix products: default
BLAS/LAPACK: /scratch/midway2/aksarkar/miniconda3/envs/scmodes/lib/R/lib/libRblas.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] MASS_7.3-50 compiler_3.5.1 Matrix_1.2-14 parallel_3.5.1
[5] mixsqp_0.2-2 Rcpp_0.12.18 codetools_0.2-15 SQUAREM_2017.10-1
[9] pscl_1.5.2 doParallel_1.0.11 grid_3.5.1 iterators_1.0.10
[13] foreach_1.4.4 truncnorm_1.0-8 ashr_2.2-39 lattice_0.20-35