optmatch icon indicating copy to clipboard operation
optmatch copied to clipboard

scoping problem(s) in `scores()`

Open benthestatistician opened this issue 4 years ago • 16 comments

In at least a couple of instances I've seen scores() fail because of objects that ought to have been on the search path not being found.

  • strata() not resolving to survival::strata() despite the survival package being a dependency (cf #105, in particular this comment).
  • Some tests written for #194 work fine in interactive use but fail from within test_that(). See partially disabled test of "scores with svyglm (survey package) objects", here, and following.
  • BTW it would be nicer if we could import from survival without depending on it.

None of these is a show-stopper, at least not in itself. So the to-do's are:

  • [ ] study the problem and potential solutions
  • [ ] set specific goals about what to fix
  • [ ] fix.

benthestatistician avatar Jan 18 '21 11:01 benthestatistician

Here's an edge case in which survival::strata() behaves differently than we might like. (Excuse for rolling our own instead?)

a  <- factor(c('b', 'b, a'))
b  <- factor(c("a, b", "b"))
data.frame(a, b, survival::strata(a,b))
##      a    b survival..strata.a..b.
## 1    b a, b                b, a, b
## 2 b, a    b                b, a, b
nlevels(survival::strata(a,b))
## [1] 1

For our purposes there ought really to be two levels here, not one, with the function figuring out a way to distinguish them. If that's asking too much, then an error or a warning.

benthestatistician avatar May 20 '21 20:05 benthestatistician

I looked into survival::strata for the IES project; the actual code required to using our own strata (or, perhaps, a different name to avoid overloading? After deprecating strata for a while of course.) is trivial; only two lines. There's a bunch of other stuff required for special cases, most of which I suspect are survival specific. I don't think it'd be too heavy a lift to bring a variation of it over, and I'd fully support removing the dependence on the survival package.

josherrickson avatar May 20 '21 23:05 josherrickson

Would we have to export it (the optmatch version of strata()) in order to serve our purposes, or could we get away with having it only within the optmatch namespace?

benthestatistician avatar May 21 '21 01:05 benthestatistician

I don't know for sure, but my guess is that we'd have to export it. I'll try and play around with it later today

josherrickson avatar May 21 '21 10:05 josherrickson

It needs to be exported.

josherrickson avatar May 21 '21 15:05 josherrickson

An optmatch-specific strata() function could help with the program sketched in this comment to #161, which would in turn set us up to better enable "tidy data" workflows.

benthestatistician avatar May 21 '21 19:05 benthestatistician

Regarding names, I'm not quite ready to abandon strata(), but blocking() would have the advantages of not conflicting with that existing function plus aligning with forcing() as proposed over in flexida.

benthestatistician avatar May 21 '21 19:05 benthestatistician

I would argue that we either need to

  1. name it something besides strata, or
  2. if we use strata, copy over survival's version completely (and keep it up to date if survival's changes).

Survival is a well known package that many people may have loaded and if we overload strata with anything custom, we'd run the risk of causing issues (or at a minimum, causing a lot of extra work to ourselves to ensure that our customizations don't break survival.)

What about within() or by() or even more directly, exactly()?

josherrickson avatar May 21 '21 23:05 josherrickson

within() and by() clash with base functions, but exactly() doesn't. It's cute, too.

I continue to like block() and/or blocking() as well, but it needn't be either-or...

benthestatistician avatar May 22 '21 22:05 benthestatistician

We could also go more verbose. matchWithin or some such.

I think if we use block, we’d run into the same issue with flexida eventually, though that is far more under our control and not a big deal. I don’t have the clearest picture of the overlap between usage of the packages, though a lot of users load packages and use R history to continue the same session so even if the same project isn’t using both packages, they may try to load both.

(Good call on within and by … that was silly of me)

On May 22, 2021, at 6:08 PM, Ben @.***> wrote:

 within() and by() clash with base functions, but exactly() doesn't. It's cute, too.

I continue to like block() and/or blocking() as well, but it needn't be either-or...

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

josherrickson avatar May 22 '21 22:05 josherrickson

I just pushed up a branch, remove_survival_strata, that removes our dependence on survival::strata in favor of our own optmatch::strata. It passes all tests involving the use of strata currently.

The actual optmatch::strata function is currently very simplistic; it just converts the variables in strata into groups and ensures NA propgates. We can tweak it or add sanity checks as desired.

josherrickson avatar Nov 18 '21 17:11 josherrickson

Nice! Left a couple minor comments on fa1b7c4 inline.

benthestatistician avatar Nov 18 '21 17:11 benthestatistician

Above you have an example that fails in survival's strata.

> a  <- factor(c("b", "b, a"))
> b  <- factor(c("a, b", "b"))
> survival::strata(a, b)
[1] b, a, b b, a, b
Levels: b, a, b
> length(unique(survival::strata(a, b)))
[1] 1
> optmatch::strata(a, b)
[1] "b a, b" "b, a b"
> length(unique(optmatch::strata(a, b)))
[1] 2

Edit: I might argue that survival::strata's handling is a bug...

josherrickson avatar Nov 18 '21 19:11 josherrickson

Would ditching the survival dependency as in the remove_survival_strata branch address the scoping problem that was noted in this comment to #105 (and cited in the problem statement at the top of this thread)? If/when that's tested there I'd appreciate a pointer to the test, and I'll see if it fixes the problem in the original example.

benthestatistician avatar Jan 20 '22 15:01 benthestatistician

Moving away from survival would probably fix that issue, as it (and this one) are both solely due our the dependence in strata.

We could also address this by intercepting the formula prior to model.frame calls, and replacing strata with our own function which would not conflict with survival's version, as opposed to the branch where I replaced survival::strata with optmatch::strata (which risks being masked by survival::strata if survival gets loaded after optmatch).

josherrickson avatar Feb 07 '22 14:02 josherrickson

I've folded the introduction of optmatch::strata into the master branch.

josherrickson avatar Mar 23 '22 16:03 josherrickson

I think this issue can be closed. If you agree @josherrickson, please close it.

(Above I had cross-referenced a comment to #161. The connection is that that comment was proposing a data structure for keeping track of subgroups of the data that are defined by combinations of levels of other variables, for use in the MCFSolutions class. As or after that's developed, it would be neat to do something similar with what optmatch::strata() produces, as well as optmatch::exactMatch() results, and to communicate the information down from specification of the matching problem down and out with the matching result. But that project is far enough from the original scope of this issue that it isn't a reason to keep the issue open.

benthestatistician avatar May 23 '24 22:05 benthestatistician

Unless you have specific tests that fail scoping, I'm good to close.

josherrickson avatar May 24 '24 18:05 josherrickson