optmatch
optmatch copied to clipboard
scoping problem(s) in `scores()`
In at least a couple of instances I've seen scores()
fail because of objects that ought to have been on the search path not being found.
-
strata()
not resolving tosurvival::strata()
despite the survival package being a dependency (cf #105, in particular this comment). - Some tests written for #194 work fine in interactive use but fail from within
test_that()
. See partially disabled test of "scores with svyglm (survey package) objects", here, and following. - BTW it would be nicer if we could import from survival without depending on it.
None of these is a show-stopper, at least not in itself. So the to-do's are:
- [ ] study the problem and potential solutions
- [ ] set specific goals about what to fix
- [ ] fix.
Here's an edge case in which survival::strata()
behaves differently than we might like. (Excuse for rolling our own instead?)
a <- factor(c('b', 'b, a'))
b <- factor(c("a, b", "b"))
data.frame(a, b, survival::strata(a,b))
## a b survival..strata.a..b.
## 1 b a, b b, a, b
## 2 b, a b b, a, b
nlevels(survival::strata(a,b))
## [1] 1
For our purposes there ought really to be two levels here, not one, with the function figuring out a way to distinguish them. If that's asking too much, then an error or a warning.
I looked into survival::strata
for the IES project; the actual code required to using our own strata
(or, perhaps, a different name to avoid overloading? After deprecating strata
for a while of course.) is trivial; only two lines. There's a bunch of other stuff required for special cases, most of which I suspect are survival specific. I don't think it'd be too heavy a lift to bring a variation of it over, and I'd fully support removing the dependence on the survival package.
Would we have to export it (the optmatch version of strata()
) in order to
serve our purposes, or could we get away with having it only within the
optmatch namespace?
I don't know for sure, but my guess is that we'd have to export it. I'll try and play around with it later today
It needs to be exported.
An optmatch-specific strata()
function could help with the program sketched in this comment to #161, which would in turn set us up to better enable "tidy data" workflows.
Regarding names, I'm not quite ready to abandon strata()
, but blocking()
would have the advantages of not conflicting with that existing function plus aligning with forcing()
as proposed over in flexida
.
I would argue that we either need to
- name it something besides
strata
, or - if we use
strata
, copy over survival's version completely (and keep it up to date if survival's changes).
Survival is a well known package that many people may have loaded and if we overload strata
with anything custom, we'd run the risk of causing issues (or at a minimum, causing a lot of extra work to ourselves to ensure that our customizations don't break survival.)
What about within()
or by()
or even more directly, exactly()
?
within()
and by()
clash with base functions, but exactly()
doesn't.
It's cute, too.
I continue to like block()
and/or blocking()
as well, but it needn't be
either-or...
We could also go more verbose. matchWithin
or some such.
I think if we use block
, we’d run into the same issue with flexida eventually, though that is far more under our control and not a big deal. I don’t have the clearest picture of the overlap between usage of the packages, though a lot of users load packages and use R history to continue the same session so even if the same project isn’t using both packages, they may try to load both.
(Good call on within and by … that was silly of me)
On May 22, 2021, at 6:08 PM, Ben @.***> wrote:
within()
andby()
clash with base functions, butexactly()
doesn't. It's cute, too.I continue to like
block()
and/orblocking()
as well, but it needn't be either-or...— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.
I just pushed up a branch, remove_survival_strata, that removes our dependence on survival::strata
in favor of our own optmatch::strata
. It passes all tests involving the use of strata
currently.
The actual optmatch::strata
function is currently very simplistic; it just converts the variables in strata
into groups and ensures NA propgates. We can tweak it or add sanity checks as desired.
Nice! Left a couple minor comments on fa1b7c4 inline.
Above you have an example that fails in survival's strata.
> a <- factor(c("b", "b, a"))
> b <- factor(c("a, b", "b"))
> survival::strata(a, b)
[1] b, a, b b, a, b
Levels: b, a, b
> length(unique(survival::strata(a, b)))
[1] 1
> optmatch::strata(a, b)
[1] "b a, b" "b, a b"
> length(unique(optmatch::strata(a, b)))
[1] 2
Edit: I might argue that survival::strata
's handling is a bug...
Would ditching the survival dependency as in the remove_survival_strata
branch address the scoping problem that was noted in this comment to #105 (and cited in the problem statement at the top of this thread)? If/when that's tested there I'd appreciate a pointer to the test, and I'll see if it fixes the problem in the original example.
Moving away from survival would probably fix that issue, as it (and this one) are both solely due our the dependence in strata.
We could also address this by intercepting the formula prior to model.frame
calls, and replacing strata
with our own function which would not conflict with survival's version, as opposed to the branch where I replaced survival::strata
with optmatch::strata
(which risks being masked by survival::strata
if survival gets loaded after optmatch).
I've folded the introduction of optmatch::strata
into the master branch.
I think this issue can be closed. If you agree @josherrickson, please close it.
(Above I had cross-referenced a comment to #161. The connection is that that comment was proposing a data structure for keeping track of subgroups of the data that are defined by combinations of levels of other variables, for use in the MCFSolutions class. As or after that's developed, it would be neat to do something similar with what optmatch::strata()
produces, as well as optmatch::exactMatch()
results, and to communicate the information down from specification of the matching problem down and out with the matching result. But that project is far enough from the original scope of this issue that it isn't a reason to keep the issue open.
Unless you have specific tests that fail scoping, I'm good to close.