optmatch
optmatch copied to clipboard
survey package support
- [x] calculate pooled s.d. with attn to weights
- [x] match_on method for svyglm's, or at least tests of the glm method with svyglm objects
- [x] Address
scores()
warning on svyglm input (fromtryCatch()
block creatingmf
) - [x] Avoid having
scores()
re-fit thesvyglm
model passing a weights argument when none was originally passed (bug!) - [x] boxplot method for svyglm's
- [x] ~~minimal adjustment to DESCRIPTION (add to
enhances:
?)~~ confirm tests pass
I have an example where I passed scores()
a svyglm w/ no missings and that had been created as follows:
object$call
svyglm(formula = form0, design = surv.d1, family = quasibinomial())
Notice no weights
argument. (With svyglm, that's conveyed through the design
argument.) However, inside of scores()
the following line recreates the model using a weights=
argument.
if (is.null(wts)) {
newobject <- update(object, formula. = formula(olddata),
data = olddata, subset = NULL)
}
else {
newobject <- update(object, formula. = formula(olddata),
data = olddata, subset = NULL, weights = wts,
evaluate = FALSE)
newobject$weights <- wts
newobject <- eval(newobject)
}
So as a result of this block we wind up with the wrong thing.
all.equal(newobject$linear.predictors, object$linear.predictors)
[1] "Mean relative difference: 0.8599866"
In ebcf484 I change the condition on the if/else alternation above to one that should work better in the svyglm scenario. My sense is that in other scenarios where the new condition gives a different result than is.null(wts)
as above, the new condition is the better bet; but perhaps @josherrickson will think of some case that didn't occur to me where is.null(wts)
actually is more appropriate (than the new all('weights'!=names(call(object)))
)?
Should we condition on both? What would happen if someone explicitly passed weights = NULL
?
Good point, I didn't think to consider the weights = NULL
possibility.
Perhaps current condition w/ something like the following tacked on: || is.null(object$call$weights)
.
Wickham's R Packages recommends against use of Enhances, so I'm skipping that. An issue with my machine prevented me from testing it [issue194-survey 2fc6f52], which is all that remains to do on this issue.
The issue94-survey branch is ready for merging into the master branch, I believe. @josherrickson would you like to give it a once-over and either kick it back or merge it in and close out the issue?
@benthestatistician Because we're using survey functions in the code, the survey package needs to appear in DESCRIPTION or else we end up with check warnings. I don't buy Wickham's argument (or complete lack-thereof) against Enhances, but putting it in Depends seems the most natural.
After fixing that, I've brought in the branch.
I hadn't thought about the check warnings problem. How about putting survey into Enhances after all? Calling it a dependency seems less accurate; plus I'm not persuaded of the Wickham argument on this point either.
(I went ahead and moved the survey entry to Enhances, in master 6f526d3.)
I see you reverted; I did a bit of reading on Enhances and I think the issue is using survey functions. For CBPS, we implemented scores.CBPS
without using any CBPS functions; but since the code here uses survey functions we need to depend on survey.
My sense of the problem agrees with yours, Josh.
Perhaps it would be better to list survey, biglm, brglm and arm under Enhances, even if some of them also have to be listed under Depends. For the purpose of communicating to users what the nature of our relationship to these packages is. Thoughts?
Having a package in more than one location produces a Note. It's not stopping the package from building, but it interferes with the goal of having the check pass without any flags.
Ah, thanks for pointing this out. The improved communication to users that we'd get from listing those packages a second time under Enhances doesn't warrant the cost of a Note.
I found that the model.frame.svyglm()
I wrote while fixing this issue does less than I had hoped, not reliably creating a model frame to which model.response()
can be applied. See message accompanying 9d2238e, which seems to fix the trouble but calls for a sharable test. I suspect that the following would illustrate the problem with the version in optmatch v 0.9-14:
library(survey)
data(api)
dstrat<-svydesign(id=~1,strata=~stype, weights=~pw, data=apistrat, fpc=~fpc)
sglm <- svyglm(sch.wide~ell+meals+mobility, design=dstrat,
family=quasibinomial())
mf <- optmatch:::model.frame.svyglm(sglm)
not_sch.wide <- model.response(mf)
all.equal(not_sch.wide, sglm$variables$sch.wide) # Expected to fail
survey is now in suggests; if a user attempts to use something that requires it and they don't have it installed, they'll get an error:
https://github.com/markmfredrickson/optmatch/blob/392b2718ce732b3a17e6c47ea4d743ee2470c5ec/R/match_on.R#L891-L893