unconf17
unconf17 copied to clipboard
External API: Querying and accessing external software (via `system()`)
In my research field, computational genomics / bioinformatics, it becoming more and more common that you run analytical pipelines that calls various standalone external software tools. This is often done via Unix shell script, but also from R itself. When doing it from R, it is quite often that everyone does some one-off implementation that is "good-enough" for what needs to be done.
R provides Sys.which() for identify external software, and system() and system2() for calling them. If one goes through the source code of R itself, one can see that there are a few different flavors of how this is used. Some functions may for instance locate external software also via an environment variable and / or R option. But, other than that there is not real standard to how this is done.
Some quick thoughts of an API
Locating external software / executables
ext <- external_find("texi2dvi")ext <- external_find("texi2dvi", where = "$PATH")ext <- external_find("texi2dvi", where = c("$PATH", "$TEX_HOME"))ext <- external_find("texi2dvi", where = c("$PATH", "$TEX_HOME/bin"))ext <- external_find("texi2dvi", version = ">= 1.4.0", where = "$PATH")external_require("texi2dvi", version = ">= 1.4.0", where = "$PATH")
Information and attributes
pathname <- pathname(ext)ver <- version(ext)
Calling
-
res <- call(ext, "--help")cf.system() -
print(work_path(res)) -
print(std_out(res)) -
print(std_err(res)) -
exit_code <- status(res) -
t <- processing_time(res) -
As a future, e.g.
f <- launch(ext)andres <- value(f) -
res <- call(ext, c("--progress", "-o" = "foo.tar.gz")
Contracts of input & output
res <- call(ext, c("-i" = pathname("foo.tar", must_exist = TRUE), "-o" = pathname("foo.tar.gz", must_not_exist = TRUE)))
That's all I have had time to scribble down for now. I'm sure there are some packages out there that may target parts of the above.
I'd definitely like to see some collected wisdom on this. Beyond direct calls to system and system2, I think I've seen clever stuff by @richfitz refer to an internal function from @gaborcsardi callr for this (https://github.com/richfitz/drat.builder/blob/master/R/utils.R#L3) (which I can't seem to find in callr.
definitely would be interested to see an implementation along the lines you sketch out above.
callr uses processx (https://github.com/r-pkgs/processx) now, which a lot of nice features (e.g. timeouts) for external processes, especially background processes.