Forward vs. back slashes killing path assertions during $sample calls when using WSL w/ Ubuntu
Describe the bug
When attempting to sample from the bernoulli example model with cmdstan installed via WSL (and importantly, running R from Windows), the $sample call fails with an assertion from validate_cmdstan_args(self). The complete format of the error message is below:
Error in validate_cmdstan_args(self) :
Assertion on 'self$data_file' failed: File does not exist: 'wsl$/Ubuntu/home/<username>/stan_models\RtmpAnJeRs\standata-81c673e5579.json'.
To Reproduce See the steps below as well as the attached image in the additional context section.
- Set your TMP, TEMP, and TMPDIR environment variables in your .Renviron file to point to a location in the WSL filesystem.
TEMP=//wsl$/Ubuntu/<your-folder-here>
TMP=//wsl$/Ubuntu/<your-folder-here>
TMPDIR=//wsl$/Ubuntu/<your-folder-here>
- Start with R 4.5 inside the vanilla RGui.
- Clear the workspace for a clean-slate.
- Install
cmdstanrusing
install.packages("cmdstanr", repos = c('https://stan-dev.r-universe.dev', getOption("repos")))
-
install_cmdstan(cores=2, wsl=T) - Follow the these steps from the vignette to sample from the model.
- Witness error occur.
Expected behavior Model should successfully sample and display warm-up and sampling output.
Operating system
Windows Windows 10 Enterprise Build 19045
WSL WSL version: 2.6.1.0 Kernel version: 6.6.87.2-1 WSLg version: 1.0.66 MSRDC version: 1.2.6353 Direct3D version: 1.611.1-81528511 DXCore version: 10.0.26100.1-240331-1435.ge-release Windows version: 10.0.19045.6332
Ubuntu on WSL Distributor ID: Ubuntu Description: Ubuntu 24.04.1 LTS Release: 24.04 Codename: noble
CmdStanR version number 0.9.0
Additional context
The error appears to be related to how R is interfacing with WSL and handling (or not) forward vs. back-slashes. But there could be something else going on here. Compilation works a-okay. After the call to the sampler some intermediate temporary files with garbage names are created and deleted. The exception is only thrown for standata-*.json files (setting the output_basename parameter didn't change anything).
I did some digging into the exception and traced back (very manually, could be incorrect) the root issue to the .wsl_check_exists function. See the attached image below.
cmdstanr:::validate_cmdstan_args -- calls --> cmdstanr:::assert_file_exists
cmdstanr:::assert_file_exists -- calls --> cmdstanr:::check_file_exists
cmdstanr:::check_file_exists -- calls --> cmdstanr:::.wsl_check_exists
When supplying a path with mixtures of forward, backward, escaped-backward slashes, etc., various errors are thrown. However, when I correctly specify the path I see the function evaluate to TRUE.
Some observations: in the assertion, the leading // in the WSL path is chopped (path <- gsub("^./", "", path). This appears to influence the output of the .wsl_check_exists function. Similarly when R creates the temporary directory in the TMPDIR, it appends it using escaped back-slashes (because Windows 😑). Likewise, somewhere along the way the standata-*.json file is appended to it's parent using backslashes at some point. I'm sure this is another potential failure mode.
For some additional background on my motivations: I'm attempting to set up STAN on my work laptop to perform some sensitivity analyses. Like other folks have noted in the past, STAN compilation and sampling is GOD AWFUL thanks to Windows + enterprise security. Compilation through WSL drastically improved compile times, but when writing back to Windows, sampling time tanked (worse than just doing everything in Windows). So I am experimenting with having STAN live in the WSL world so I can have my cake and eat it too. Along the way I ran face first into this issue, and here we are.
To be transparent, it is very possible that I'm an idiot and completely missed something obvious. If that's the case, apologies for wasting everyone's time in advance 😓
Update: did a bit of late night delirium perusing through the source.
I think the culprit for this particular instance is wsl_safe_path. But it's hard to tell. Looks like there's a lot of path modification going on.
Update 2: I pulled the source and played around with some of the path stuff and I think I've found a solution, but I need to gather my thoughts first. At risk of being a little handwavy here, this is what I've found:
- The offender (at least in the Rproject) was
.wsl_check_exists. More precisely, the call toprocessx::run. Looks like it doesn't appreciate the\\that R is appending to the path. If I addpath <- gsub("[\\]", "/", path)to the function body that appears to solve the problem. - Once (1) is fixed, it appears something similar occurs when validating
self$output_dir. But I'm super lazy, so I hardcoded my wsl paths so could prove to myself that I could have cmdstanr and its output living entirely in WSL while enjoying increased compilation and sampling speed. TL;DR: worked pretty okay.
I've included the sampling output below as a point of interest. Note that mean chain execution time is effectively instant. To be fair, there is an inflated wall time but it's far more tolerable than what I was dealing with before. And I imagine that the bulk of that time is just IO (digression: I'm no CS person nor expert programmer, but assuming this is the case, would dropping output into [a] sqlite table[s] vs. csvs improve performance?).
Running MCMC with 4 parallel chains...
Chain 1 Iteration: 1 / 2000 [ 0%] (Warmup)
Chain 1 Iteration: 500 / 2000 [ 25%] (Warmup)
Chain 1 Iteration: 1000 / 2000 [ 50%] (Warmup)
Chain 1 Iteration: 1001 / 2000 [ 50%] (Sampling)
Chain 1 Iteration: 1500 / 2000 [ 75%] (Sampling)
Chain 1 Iteration: 2000 / 2000 [100%] (Sampling)
Chain 2 Iteration: 1 / 2000 [ 0%] (Warmup)
Chain 2 Iteration: 500 / 2000 [ 25%] (Warmup)
Chain 2 Iteration: 1000 / 2000 [ 50%] (Warmup)
Chain 2 Iteration: 1001 / 2000 [ 50%] (Sampling)
Chain 2 Iteration: 1500 / 2000 [ 75%] (Sampling)
Chain 2 Iteration: 2000 / 2000 [100%] (Sampling)
Chain 3 Iteration: 1 / 2000 [ 0%] (Warmup)
Chain 3 Iteration: 500 / 2000 [ 25%] (Warmup)
Chain 3 Iteration: 1000 / 2000 [ 50%] (Warmup)
Chain 3 Iteration: 1001 / 2000 [ 50%] (Sampling)
Chain 3 Iteration: 1500 / 2000 [ 75%] (Sampling)
Chain 3 Iteration: 2000 / 2000 [100%] (Sampling)
Chain 1 finished in 0.0 seconds.
Chain 4 Iteration: 1 / 2000 [ 0%] (Warmup)
Chain 4 Iteration: 500 / 2000 [ 25%] (Warmup)
Chain 4 Iteration: 1000 / 2000 [ 50%] (Warmup)
Chain 4 Iteration: 1001 / 2000 [ 50%] (Sampling)
Chain 4 Iteration: 1500 / 2000 [ 75%] (Sampling)
Chain 4 Iteration: 2000 / 2000 [100%] (Sampling)
Chain 2 finished in 0.0 seconds.
Chain 3 finished in 0.0 seconds.
Chain 4 finished in 0.0 seconds.
All 4 chains finished successfully.
Mean chain execution time: 0.0 seconds.
Total execution time: 44.6 seconds.
Anyway, that's what I got. I don't know how to proceed from here. I suppose if anyone is able to replicate the initial "error", then we can leave this open along with a potential fix. Otherwise if all this was just a result of my own incompetence we can close this with extreme prejudice.
Thanks for reporting this and investigating it! I'm not a Windows or WSL user at all, so I'm going to need some help if we're going to fix this. @andrjohns @SteveBronder @katrinabrock do any of you have a moment to take a look at this? It seems like there should be a relatively simple fix for this, but I'm not entirely sure.
It seems like this
- Looks like it doesn't appreciate the
\\that R is appending to the path. If I addpath <- gsub("[\\]", "/", path)to the function body that appears to solve the problem.
fixed the issue for @jr-free, but could this have any unintended consequences?
I don't have a windows dev setup currently. I plan to set one up at some point (probably months from now). If this isn't solved by then, I can take a look at that point.
Update: I just attempted this from my personal laptop that runs Windows 11 with WSL2 running a Fedora distro -- replicated exactly.
So this may be a general issue when attempting to have cmdstanr write output to WSL.
One observation: looks like the TMP / TEMP / TMPDIR variables don't necessarily need the //wsl$ prefix -- giving the Linux native path worked just fine. From what I can tell the path manipulation in the code provides a correction for WSL paths.
It seems like this
- Looks like it doesn't appreciate the
\\that R is appending to the path. If I addpath <- gsub("[\\]", "/", path)to the function body that appears to solve the problem.fixed the issue for @jr-free, but could this have any unintended consequences?
I'm taking the liberty to dig into this a bit deeper. It looks that that doesn't entirely fix the issue as there are some additional path mangling that occurs further downstream.
I think there is...something else going on here, though. I added some super-elite-expert-programmer print statements in the source to monitor the inputs and outputs of the wsl_safe_path and .wsl_check_exists functions during sampling, and it appears that cmdstanr has logic to work from a WSL native temp dir (ala /tmp/<stuff>, see wsl_tempdir), but then later tramples over itself with calls to tempdir(), which either pulls in a generic Windows tempdir or that which is defined in the Renviron.
I tried clearing my env variables entirely and the sampler orbital nuked itself, so....yeah. Definitely something weird going on, but not ruling out a skill issue on my part.