cmdstanr make save_output_files create files with all comment lines at the start

A simple preprocessing with grep -v '^#' is one way to solve this issue, and maybe it wouldn't be a simple fix inside cmdstanr if it's related to how these files are written during sampling, but just in case it would be simple inside cmdstanr...

The files created by save_output_files have some comment lines at the start, some more in between the column headers (parameter names) and the values, and some more at the end. This is too much for poor data.table::fread() to handle: pending its long awaited comment.char argument, it can only reliably skip lines that come together at the start of the file. Since data.table::fread() is go-to for huge csv files, it would be nice if all the comment lines were put together at the start of the file, such that these files can be read as-is by fread.

Example

library(data.table)
library(cmdstanr)

code <- "
data {
int N;
vector[N] x;
vector[N] y;
}
parameters {
real m;
real c;
real sigma;
}
model {
y ~ normal(m * x + c, sigma);
}
"
file <- write_stan_file(code)
model <- cmdstan_model(file)
samples <- model$sample(data = list(N = 1, x = 1, y = 1), iter_sampling = 10, iter_warmup = 10)
samples$save_output_files("~/", basename = "foo", timestamp = FALSE, random = FALSE)

df_ <- fread("~/foo-1.csv")

gives

Warning messages:
1: In fread("~/foo-1.csv") :
  Detected 3 column names but the data has 10 columns (i.e. invalid file). Added 7 extra default column names at the end.
2: In fread("~/foo-1.csv") :
  Stopped early on line 63. Expected 10 fields but found 1. Consider fill=TRUE and comment.char=. First discarded non-empty line: <<# >>

and df_ is

         # 1        1       1    V4    V5    V6      V7        V8         V9       V10
       <num>    <num>   <num> <int> <int> <int>   <num>     <num>      <num>     <num>
 1: -4.76425 0.999885 2.74896     7   127     0 6.27136   184.340 -244.86300   95.1088
 2: -4.66059 0.999970 2.74896     6    63     0 5.03261   213.799 -171.11100   96.2315
 3: -5.09712 0.981220 2.74896     6    66     1 6.39704   166.980 -125.97500  158.4170
 4: -6.11659 0.999955 2.74896     7   127     0 6.23478   281.608   -7.43179  252.5850
 5: -6.35803 0.999807 2.74896     8   255     0 9.37554   242.192 -442.52700  538.0940
 6: -7.64953 0.999985 2.74896    10  1023     0 8.23180   444.349 -868.81500 2055.1400
 7: -7.33027 1.000000 2.74896    10  1023     0 8.37176  -251.299  922.23900 1348.7000
 8: -6.84893 1.000000 2.74896    10  1023     0 7.82562 -1589.190 1803.51000  917.7360
 9: -6.91315 0.999973 2.74896     8   511     0 7.26564 -1565.040 1840.16000  965.7080
10: -6.92680 0.999974 2.74896     9   831     0 7.81856 -2004.660 2241.34000  990.8020

Sep 04 '25 16:09 ChrisHIV

Sorry I'm just seeing this issue now, not sure why I missed it before. Unfortunately this is the way CmdStan itself writes the CSV files during sampling, not something that the R package decides or modifies (we also use fread inside the R package and have to get around this issue too). I don't think we want to mess with the CSV files that CmdStan creates since there's already a lot of code that assumes that they are the way they are, but I agree it's suboptimal. I guess we could consider adding an argument to save_output_files() that can be turned on to strip comments from the CSV files?

Dec 11 '25 22:12 jgabry

Or just rely on fread(cmd = paste("grep -v '^#'", my_file) that's OK too. Was just wondering if there was a simple fix.

Dec 12 '25 10:12 ChrisHIV