cmdstanr icon indicating copy to clipboard operation
cmdstanr copied to clipboard

Error: Can't find the following variable(s) in the output: lp_approx__

Open stephensrmmartin opened this issue 1 year ago • 4 comments

Hello all,

Please see attached issue from the brms tracker. https://github.com/paul-buerkner/brms/issues/1473

The TLDR:

  1. When using brms's variational method, brms gathers the available variables from the metadata field of read_cmdstan_csv
  2. Among these, is lp_approx__
  3. Brms then calls csfit <- cmdstanr::read_cmdstan_csv(files = files, variables = variables, sampler_diagnostics = sampler_diagnostics, format = NULL) within its function: read_csv_as_stanfit
  4. An error occurs: Error: Can't find the following variable(s) in the output: lp_approx__

I have not dug into the underlying cause, but I suspect the actual csv contains something like lp_p__ and lp_q__, and not lp_approx__, and an error is throw when lp_approx__ does not yet exist (it has not been renamed from lp_q__) ?

Versions

brms_2.19.0 cmdstanr_0.5.3 Ubuntu 20.04 R 4.2.2

stephensrmmartin avatar Mar 29 '23 23:03 stephensrmmartin

Ah ha, yes this seems to be the issue.

Browse[2]> 
debug: res <- matching_variables(variables, repair_variable_names(metadata$variables))
Browse[2]> 
debug: if (length(res$not_found)) {
    stop("Can't find the following variable(s) in the output: ", 
        paste(res$not_found, collapse = ", "), call. = FALSE)
}

In this case, variables contains:

Browse[2]> variables
 [1] "lp__"               "lp_approx__"        "b_logalpha"        
 [4] "b_personEffect"     "sd_1"               "sd_2"              
 [7] "sd_3"               "r_1_personEffect_1" "r_2_personEffect_1"
[10] "r_2_personEffect_2" "r_3_difficulty_1"   "lprior"            
[13] "cor_2"             

Within metadata$variables:

Browse[2]> head(metadata$variables)
[1] "lp__"             "log_p__"          "log_g__"          "b_logalpha.1"    
[5] "b_personEffect.1" "sd_1.1"          

Which causes the not_found to trip:

Browse[2]> res$not_found
[1] "lp_approx__"

stephensrmmartin avatar Mar 29 '23 23:03 stephensrmmartin

Ok, confirmed this I believe.

On line 211 of https://github.com/stan-dev/cmdstanr/blob/master/R/csv.R#L211

it checks for the existence of the provided variables within the metadata$variables of the read csv file. At this stage, lp_approx__ does not exist, it is lp_g__. Therefore it fails.

Later in the file, lp_g__ is renamed to lp_approx__. https://github.com/stan-dev/cmdstanr/blob/master/R/csv.R#L306

This leads to an inconsistency: If you use read_cmdstan_csv, and save the variables into a vector, it will include lp_approx__. If you feed that same vector into read_cmdstan_csv, it will fail, because it does not see lp_approx__ in the metadata prior to the renaming.

I would suggest either adding an exception into the metadata check for possible lp variables, or moving the metadata renaming up prior to the variable check.

stephensrmmartin avatar Mar 30 '23 00:03 stephensrmmartin

I just created a quick fix on a fork: https://github.com/stephensrmmartin/cmdstanr/commit/9e0cda554df3a7fc6d0433df2e2571ae1c42bb5c

I imagine there's a better way of handling that, but I didn't want to overengineer a quick fix for my personal usecase. I imagine a better approach could be:

  1. Adding an optional "ignore_variables" or "pass_variables" argument to the utils::matching_variables function, which would be a more generic way of ignoring mismatches for anything in that list.

  2. Moving the renaming logic before the check.

Edit: Note something else is broken; this fix is incomplete. I'll hack on it and get back once I have a functioning patch.

Edit2: I tried another approach (https://github.com/stephensrmmartin/cmdstanr/blob/hotfix/lp_approx__not_found_alt1/R/csv.R#L200), which avoids the same error.

However, either approach to fixing the lp_approx__ not found leads to downstream errors in brms. I suspect the downstream bug is due to how I am resolving it, but I don't know a better way to resolve it. The downstream bug winds up being because there are (P) variables in the samples, but (P+2) variables in the names of variables; consequently, when brms re-orders the samples data.frame, there's an indexing error (undefined columns selected). The two additional variables are lp__ and lp_approx__.

To me, it seems like there is a desync somewhere between what is a variable and what is a column name. Alternatively put, there's an inconsistency in listed variables (being non-diagnostic, but tracked columns in samples) and in model variables (things the user explicitly list as a variable to be tracked), such that one source says there are P variables and another says there are P+2 variables, and that perhaps this has changed at some point in the last year.

I have no idea where or how that would occur, or whether this is a brms or a cmdstanr bug, but it seems to break VI in brms.

stephensrmmartin avatar Mar 30 '23 00:03 stephensrmmartin

Update, this appears to be a bug introduced in brms at commit: https://github.com/paul-buerkner/brms/commit/34203cc9bd34f7171a20671ad963440f91765cd4

VI in Brms 2.17 works with the latest cmdstanr and cmdstan; I bisected it to brms 2.19, and that commit is where VI breaks.

However, I suspect this could be considered a bug of either cmdstanr or brms, since brms is simply trying to grab the variable names from cmdstanr.

stephensrmmartin avatar Mar 31 '23 23:03 stephensrmmartin