DeclareDesign icon indicating copy to clipboard operation
DeclareDesign copied to clipboard

many-to-many warning triggered incorrectly(?)

Open graemeblair opened this issue 3 years ago • 1 comments

design <- 
  declare_model(
    N = 100,
    U = rnorm(N)
  ) + 
  declare_inquiry(ATE = 1,
                  ATE_1 = 2,
                  ATE_2 = 3) + 
  declare_estimator(U ~ 1, term = "(Intercept)", inquiry = "ATE", label = "est1") + 
  declare_estimator(U ~ 1, term = "(Intercept)", inquiry = "ATE", label = "est2")

run_design(design)
Warning message:
In simulate_single_design(design, sims = 1, low_simulations_warning = FALSE) :
  Estimators lack inquiry/term labels for matching, a many-to-many merge was performed.

When you set the inquiries to be two different ones, it doesn't throw a warning. In both cases the results seem correct.

graemeblair avatar Oct 19 '21 12:10 graemeblair

I ran this in the debugger, here are some notes:

warning is triggered here: https://github.com/DeclareDesign/DeclareDesign/blob/679bd32fc9bb0e5e6e1ff318d227c95c09ef39af/R/simulate_design.R#L204

Note that the merge is doing an outer join - the check on the cardinality of the results is only appropriate for a left join, so you could consider this a false positive caused by the many-to-one linked inquiry + two unlinked inquiries.

Browse[2]> estimates_df
  sim_ID estimator        term   estimate  std.error statistic   p.value   conf.low  conf.high df outcome inquiry
1      1      est1 (Intercept) -0.1029885 0.09492354 -1.084962 0.2805734 -0.2913374 0.08536045 99       U     ATE
2      1      est2 (Intercept) -0.1029885 0.09492354 -1.084962 0.2805734 -0.2913374 0.08536045 99       U     ATE
Browse[2]> inquiries_df
  sim_ID inquiry estimand
1      1     ATE        1
2      1   ATE_1        2
3      1   ATE_2        3
Browse[2]> simulations_df
  sim_ID inquiry estimand estimator        term   estimate  std.error statistic   p.value   conf.low  conf.high df outcome
1      1     ATE        1      est1 (Intercept) -0.1029885 0.09492354 -1.084962 0.2805734 -0.2913374 0.08536045 99       U
2      1     ATE        1      est2 (Intercept) -0.1029885 0.09492354 -1.084962 0.2805734 -0.2913374 0.08536045 99       U
3      1   ATE_1        2      <NA>        <NA>         NA         NA        NA        NA         NA         NA NA    <NA>
4      1   ATE_2        3      <NA>        <NA>         NA         NA        NA        NA         NA         NA NA    <NA>
Browse[2]> nrow(simulations_df) > max(nrow(inquiries_df), nrow(estimates_df))
[1] TRUE

A more rigorous check could be constructed by appending a synthetic ID to each input table, and then checking those in the output for the actual presences of a many-to-many.

nfultz avatar Oct 20 '21 18:10 nfultz