read csv with variable names that contain parentheses
Describe the bug
I'm trying to piggyback cmdstanr to read samples generated elsewhere. as_cmdstan_fit complains variable names such as "OMEGA(1,1)", whiledata.table::fread ingests fine. I think it's caused by the ~~internal fread_cmd's grepping~~ unpacking variable names.
To Reproduce For a csv with content (dummy metadata)
# model = norm_model
# start_datetime = 2025-11-19 22:23:43 UTC
# method = sample (Default)
# sample
# ....
THETA4,"SIGMA(1,1)"
2.00000E+00,2.00000E+00
2.00000E+00,2.00000E+00
as_cmdstan_fit("foo.csv", check_diagnostics=FALSE) errors
Error in grep(pattern, var_names) :
invalid regular expression '^"SIGMA(1\[', reason 'Missing ')''
In addition: Warning message:
In grep(pattern, var_names) : TRE pattern compilation error 'Missing ')''
Expected behavior
No error, as by fread:
fread(cmd="grep -v '^#' --color=never foo.csv")
THETA4 SIGMA(1,1)
<num> <num>
1: 2 2
2: 2 2
Operating system MacOS 15.7 (24G222) 11881.140.96 Ubuntu 24.04
CmdStanR version number cmdstanr_0.9.0 & master 7c7b5d2b
@yizhang-yiz Thanks for reporting this. Does #1120 solve this for you?
The issue is a bit more complicated than I thought. We can get fread working using #1120, but unfortunately there are some other problems after that that I don't have time to dive into (I was hoping #1120 would be sufficient). I think for now the best solution is to convert variable names before calling as_cmdstan_fit(). If you have a variable name Sigma(1,1) then CmdStan would name it Sigma.1.1. If you replace the parentheses with periods using regular expressions or whatever method you want then it should work with as_cmdstan_fit(). In the resulting fitted model object it would then become Sigma[1,1].
EDIT: maybe we can make this work after all. See discussion over at #1120