DeclareDesign
DeclareDesign copied to clipboard
many-to-many warning triggered incorrectly(?)
design <-
declare_model(
N = 100,
U = rnorm(N)
) +
declare_inquiry(ATE = 1,
ATE_1 = 2,
ATE_2 = 3) +
declare_estimator(U ~ 1, term = "(Intercept)", inquiry = "ATE", label = "est1") +
declare_estimator(U ~ 1, term = "(Intercept)", inquiry = "ATE", label = "est2")
run_design(design)
Warning message:
In simulate_single_design(design, sims = 1, low_simulations_warning = FALSE) :
Estimators lack inquiry/term labels for matching, a many-to-many merge was performed.
When you set the inquiries to be two different ones, it doesn't throw a warning. In both cases the results seem correct.
I ran this in the debugger, here are some notes:
warning is triggered here: https://github.com/DeclareDesign/DeclareDesign/blob/679bd32fc9bb0e5e6e1ff318d227c95c09ef39af/R/simulate_design.R#L204
Note that the merge is doing an outer join - the check on the cardinality of the results is only appropriate for a left join, so you could consider this a false positive caused by the many-to-one linked inquiry + two unlinked inquiries.
Browse[2]> estimates_df
sim_ID estimator term estimate std.error statistic p.value conf.low conf.high df outcome inquiry
1 1 est1 (Intercept) -0.1029885 0.09492354 -1.084962 0.2805734 -0.2913374 0.08536045 99 U ATE
2 1 est2 (Intercept) -0.1029885 0.09492354 -1.084962 0.2805734 -0.2913374 0.08536045 99 U ATE
Browse[2]> inquiries_df
sim_ID inquiry estimand
1 1 ATE 1
2 1 ATE_1 2
3 1 ATE_2 3
Browse[2]> simulations_df
sim_ID inquiry estimand estimator term estimate std.error statistic p.value conf.low conf.high df outcome
1 1 ATE 1 est1 (Intercept) -0.1029885 0.09492354 -1.084962 0.2805734 -0.2913374 0.08536045 99 U
2 1 ATE 1 est2 (Intercept) -0.1029885 0.09492354 -1.084962 0.2805734 -0.2913374 0.08536045 99 U
3 1 ATE_1 2 <NA> <NA> NA NA NA NA NA NA NA <NA>
4 1 ATE_2 3 <NA> <NA> NA NA NA NA NA NA NA <NA>
Browse[2]> nrow(simulations_df) > max(nrow(inquiries_df), nrow(estimates_df))
[1] TRUE
A more rigorous check could be constructed by appending a synthetic ID to each input table, and then checking those in the output for the actual presences of a many-to-many.