optmatch icon indicating copy to clipboard operation
optmatch copied to clipboard

survey package support

Open benthestatistician opened this issue 4 years ago • 14 comments

  • [x] calculate pooled s.d. with attn to weights
  • [x] match_on method for svyglm's, or at least tests of the glm method with svyglm objects
  • [x] Address scores() warning on svyglm input (from tryCatch() block creating mf)
  • [x] Avoid having scores() re-fit thesvyglm model passing a weights argument when none was originally passed (bug!)
  • [x] boxplot method for svyglm's
  • [x] ~~minimal adjustment to DESCRIPTION (add to enhances:?)~~ confirm tests pass

benthestatistician avatar Jan 04 '21 18:01 benthestatistician

I have an example where I passed scores() a svyglm w/ no missings and that had been created as follows:

object$call
svyglm(formula = form0, design = surv.d1, family = quasibinomial())

Notice no weights argument. (With svyglm, that's conveyed through the design argument.) However, inside of scores() the following line recreates the model using a weights= argument.

 if (is.null(wts)) {
            newobject <- update(object, formula. = formula(olddata), 
                data = olddata, subset = NULL)
        }
        else {
            newobject <- update(object, formula. = formula(olddata), 
                data = olddata, subset = NULL, weights = wts, 
                evaluate = FALSE)
            newobject$weights <- wts
            newobject <- eval(newobject)
        }

So as a result of this block we wind up with the wrong thing.

 all.equal(newobject$linear.predictors, object$linear.predictors)
[1] "Mean relative difference: 0.8599866"

In ebcf484 I change the condition on the if/else alternation above to one that should work better in the svyglm scenario. My sense is that in other scenarios where the new condition gives a different result than is.null(wts) as above, the new condition is the better bet; but perhaps @josherrickson will think of some case that didn't occur to me where is.null(wts) actually is more appropriate (than the new all('weights'!=names(call(object))))?

benthestatistician avatar Jan 06 '21 17:01 benthestatistician

Should we condition on both? What would happen if someone explicitly passed weights = NULL?

josherrickson avatar Jan 06 '21 19:01 josherrickson

Good point, I didn't think to consider the weights = NULL possibility. Perhaps current condition w/ something like the following tacked on: || is.null(object$call$weights).

benthestatistician avatar Jan 06 '21 21:01 benthestatistician

Wickham's R Packages recommends against use of Enhances, so I'm skipping that. An issue with my machine prevented me from testing it [issue194-survey 2fc6f52], which is all that remains to do on this issue.

benthestatistician avatar May 16 '21 02:05 benthestatistician

The issue94-survey branch is ready for merging into the master branch, I believe. @josherrickson would you like to give it a once-over and either kick it back or merge it in and close out the issue?

benthestatistician avatar May 17 '21 15:05 benthestatistician

@benthestatistician Because we're using survey functions in the code, the survey package needs to appear in DESCRIPTION or else we end up with check warnings. I don't buy Wickham's argument (or complete lack-thereof) against Enhances, but putting it in Depends seems the most natural.

After fixing that, I've brought in the branch.

josherrickson avatar May 17 '21 17:05 josherrickson

I hadn't thought about the check warnings problem. How about putting survey into Enhances after all? Calling it a dependency seems less accurate; plus I'm not persuaded of the Wickham argument on this point either.

benthestatistician avatar May 17 '21 17:05 benthestatistician

(I went ahead and moved the survey entry to Enhances, in master 6f526d3.)

benthestatistician avatar May 17 '21 19:05 benthestatistician

I see you reverted; I did a bit of reading on Enhances and I think the issue is using survey functions. For CBPS, we implemented scores.CBPS without using any CBPS functions; but since the code here uses survey functions we need to depend on survey.

josherrickson avatar May 17 '21 23:05 josherrickson

My sense of the problem agrees with yours, Josh.

Perhaps it would be better to list survey, biglm, brglm and arm under Enhances, even if some of them also have to be listed under Depends. For the purpose of communicating to users what the nature of our relationship to these packages is. Thoughts?

benthestatistician avatar May 18 '21 12:05 benthestatistician

Having a package in more than one location produces a Note. It's not stopping the package from building, but it interferes with the goal of having the check pass without any flags.

josherrickson avatar May 18 '21 18:05 josherrickson

Ah, thanks for pointing this out. The improved communication to users that we'd get from listing those packages a second time under Enhances doesn't warrant the cost of a Note.

benthestatistician avatar May 18 '21 19:05 benthestatistician

I found that the model.frame.svyglm() I wrote while fixing this issue does less than I had hoped, not reliably creating a model frame to which model.response() can be applied. See message accompanying 9d2238e, which seems to fix the trouble but calls for a sharable test. I suspect that the following would illustrate the problem with the version in optmatch v 0.9-14:

library(survey)
data(api)
dstrat<-svydesign(id=~1,strata=~stype, weights=~pw, data=apistrat, fpc=~fpc)
sglm <- svyglm(sch.wide~ell+meals+mobility, design=dstrat,
             family=quasibinomial())
mf <- optmatch:::model.frame.svyglm(sglm)
not_sch.wide <- model.response(mf)
all.equal(not_sch.wide, sglm$variables$sch.wide) # Expected to fail

benthestatistician avatar Jul 01 '21 15:07 benthestatistician

survey is now in suggests; if a user attempts to use something that requires it and they don't have it installed, they'll get an error:

https://github.com/markmfredrickson/optmatch/blob/392b2718ce732b3a17e6c47ea4d743ee2470c5ec/R/match_on.R#L891-L893

josherrickson avatar Mar 23 '22 16:03 josherrickson