callr
callr copied to clipboard
Providing option to override temp location would be a great enhancement
Our server cluster is set up so each server has a small /tmp partition. When I need to save out large temporary files, I do so by manually specifying another partition when I create the temporary files. I don't see any facility in callr for specifying a different location other than the one returned by tempdir(). If I understand correctly, I would have to start R by specifying a different location either in .Renviron or using an environment variable. Our R setup works fine for our regular use, but I started testing out callr using future.callr and quickly brought a server to its knees and incurred the wrath of our sys admins. With some convincing I can get the setup of our cluster changed, but it would be nice to have control over the temp location callr uses from inside R (instead of having to set it before R starts). However, not sure if that's even possible based on how each callr session starts.
That seems reasonable to me. OTOH the advantage of an environment variable is that it is inherited in subprocesses by default, whereas an option will not be set in a subprocess. If you set an option in .Rprofile, then you might as well set TMPDIR (or a callr specific env var) in .Renviron, no?
Here is a workaround to change the temporary directory from within the session. It works from R 3.5:
> tempdir()
[1] "/var/folders/59/0gkmw1yj2w7bf2dfc3jznv5w0000gn/T//RtmpUI1O6O"
> Sys.setenv(TMPDIR = "/Users/gaborcsardi")
> unlink(tempdir(), recursive = TRUE)
> tempdir(check = TRUE)
[1] "/Users/gaborcsardi/RtmpXaziuR"
> tempdir()
[1] "/Users/gaborcsardi/RtmpXaziuR"
@scottporter So, how about setting TMPDIR? Is there anything wrong with that?
Just noticed the response. I'll give the workaround a try and give feedback.
At least on Linux, changing the temp location the way you've laid out doesn't work (I tried R 3.4.1 and R 3.5.3) Here is the log from R 3.5.3.
> tempdir()
[1] "/tmp/Rtmpi9Icnx"
> Sys.setenv(TMPDIR = tools::file_path_as_absolute("~/tmp/rcall"))
> unlink(tempdir(), recursive = TRUE)
> tempdir(check=TRUE)
[1] "/tmp/Rtmpi9Icnx"
> tempdir()
[1] "/tmp/Rtmpi9Icnx"
>
Setting the environment variable in the session is too late for the tempdir... it's already been specified. I also tried running my process to see if, even though it doesn't reflect properly here, if that environment variable would get picked up by the callr sessions... but no such luck.
So, the only workaround that I have found is adding the environment variable to my ~.Renviron file.
You have to remove the old temp dir first, like here: https://github.com/r-lib/callr/issues/172#issuecomment-710700559
But setting it in .Renviron is completely fine as well.
EDIT: now you edited and added the unlink() line, but with that I am pretty sure that it works, assuming the new TMPDIR exists. This is Linux and R 3.5.3:
> tempdir()
[1] "/tmp/RtmpEpCIDc"
> newtmp <- "~/tmp/rcall"
> dir.create(newtmp, recursive = TRUE)
> Sys.setenv(TMPDIR = tools::file_path_as_absolute(newtmp))
> unlink(tempdir(), recursive = TRUE)
> tempdir(check=TRUE)
[1] "/root/tmp/rcall/RtmpmGFIia"
My edit was because I tried it again, adding the unlink, and got the same result. However, I tried your code above, and it worked. I'm not sure what I did wrong last time.
Thanks.
R version 3.5.3 (2019-03-11) -- "Great Truth"
Copyright (C) 2019 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
Type 'q()' to quit R.
Memory limits on this session set with `unix::rlimit_as`
Soft limit: 1e+10
Completed loading customized R settings from Rprofile.site
##------ [/sasdata/uat/rs/sporter/repos/method/crest_u21] Wed Dec 23 14:41:17 2020 ------##
> tempdir()
[1] "/tmp/RtmpHPRloQ"
> newtmp <- "~/tmp/rcall"
> dir.create(newtmp, recursive = TRUE)
Warning message:
In dir.create(newtmp, recursive = TRUE) :
'/users/sporter/tmp/rcall' already exists
> Sys.setenv(TMPDIR = tools::file_path_as_absolute(newtmp))
> unlink(tempdir(), recursive = TRUE)
> tempdir(check=TRUE)
[1] "/users/sporter/tmp/rcall/RtmpB0piE4"
I'm guessing that last time I accidentally ran it on R 3.4.1. I have both that and R 3.5.3 installed on my server cluster. I don't think this workaround works for R 3.4.1, which is why I ended up getting so confused.
There is no tempdir(check = TRUE) on R 3.4.x AFAICT, so it cannot work
there.
On Wed, Dec 23, 2020 at 7:54 PM scottporter [email protected] wrote:
I'm guessing that last time I accidentally ran it on R 3.4.1. I have both that and R 3.5.3 installed on my server cluster. I don't think this workaround works for R 3.4.1, which is why I ended up getting so confused.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/r-lib/callr/issues/172#issuecomment-750449828, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFBGQDC3PCEBWEHUWXZH6LSWJDHJANCNFSM4ST566DA .
Thanks again.
Since there is an adequate workaround for this, I am going to close it.
Apologies for posting on an old-ish issue, but I thought it was most relevant here.
I've tried this workaround for a problem I'm having where the R tempdir() (apparently?) gets removed by a child process, and as a result callr fails. The workaround here fails because it still depends on tempdir. I've reported this here: https://github.com/rexyai/RestRserve/issues/174#issuecomment-1373439240 (see also https://stat.ethz.ch/pipermail/r-devel/2017-February/073748.html) and I've managed to work around it by manually setting working folders for packages such as cachem that allow it, but it seems that callr does not.
When I try to recreate a tempdir using check=TRUE, a new tempdir is created, but callr still points back to the old one:
td = tempdir()
td
# [1] "/var/folders/59/r7wjy0gn6yv59w19694x5gy80000gp/T//RtmpQNELAf"
callr::r(function(){ 2+2 })
# [1] 4
Deleting the temp folder causes callr::r to fail, as expected:
unlink(td, recursive = TRUE)
callr::r(function(){ 2+2 })
# Error in file(con, "wb") : cannot open the connection
# In addition: Warning message:
# In file(con, "wb") :
cannot open file '/var/folders/59/r7wjy0gn6yv59w19694x5gy80000gp/T//Rtmp1IhdbV/callr-client--f520c18.so': No such file or directory
Now recreate a tempdir() and try again:
tempdir(check = TRUE)
# [1] "/var/folders/59/r7wjy0gn6yv59w19694x5gy80000gp/T//Rtmp8wyQAC"
callr::r(function(){ 2+2 })
# Error in file(con, "wb") : cannot open the connection
# In addition: Warning message:
# In file(con, "wb") :
cannot open file '/var/folders/59/r7wjy0gn6yv59w19694x5gy80000gp/T//Rtmp1IhdbV/callr-client--f520c18.so': No such file or directory
Honestly, I wish I knew what how the tempdirs were being deleted and how to fix that, but the I can't get the problem to reliably occur. It seems like relying on tempdir() can cause issues so a workaround would be appreciated.
Can you set the TMPDIR env var before starting R, e.g. in a shell, to a place other than the default "/tmp, so the temporary file cleaning processes of the system do not delete tempdirs of long running R processes?
Can you set the
TMPDIRenv var before starting R, e.g. in a shell, to a place other than the default"/tmp, so the temporary file cleaning processes of the system do not delete tempdirs of long running R processes?
Yes; in my application, this is unsufficient to avoid the error. Something (I think maybe when a child process exits) is deleting the folder while the main process is running, I think.
In my .Renviron I have:
TMPDIR = /Users/saprm3/tmp/
and sometimes, when callr::r is run, I get (e.g.):
In file(con, "wb") :
cannot open file '/Users/saprm3/tmp//RtmpcYeSpd/callr-client--f520c18.so': No such file or directory
and RtmpcYeSpd no longer exists (though the main R process is still running).
But this is annoying because I can't replicate it reliably enough to pin down what is deleting the tempdir. That's why it would be useful to use a folder that doesn't rely on tempdir().