unconf17 icon indicating copy to clipboard operation
unconf17 copied to clipboard

External API: Querying and accessing external software (via `system()`)

Open HenrikBengtsson opened this issue 8 years ago • 2 comments

In my research field, computational genomics / bioinformatics, it becoming more and more common that you run analytical pipelines that calls various standalone external software tools. This is often done via Unix shell script, but also from R itself. When doing it from R, it is quite often that everyone does some one-off implementation that is "good-enough" for what needs to be done.

R provides Sys.which() for identify external software, and system() and system2() for calling them. If one goes through the source code of R itself, one can see that there are a few different flavors of how this is used. Some functions may for instance locate external software also via an environment variable and / or R option. But, other than that there is not real standard to how this is done.

Some quick thoughts of an API

Locating external software / executables

  • ext <- external_find("texi2dvi")
  • ext <- external_find("texi2dvi", where = "$PATH")
  • ext <- external_find("texi2dvi", where = c("$PATH", "$TEX_HOME"))
  • ext <- external_find("texi2dvi", where = c("$PATH", "$TEX_HOME/bin"))
  • ext <- external_find("texi2dvi", version = ">= 1.4.0", where = "$PATH")
  • external_require("texi2dvi", version = ">= 1.4.0", where = "$PATH")

Information and attributes

  • pathname <- pathname(ext)
  • ver <- version(ext)

Calling

  • res <- call(ext, "--help") cf. system()

  • print(work_path(res))

  • print(std_out(res))

  • print(std_err(res))

  • exit_code <- status(res)

  • t <- processing_time(res)

  • As a future, e.g. f <- launch(ext) and res <- value(f)

  • res <- call(ext, c("--progress", "-o" = "foo.tar.gz")

Contracts of input & output

  • res <- call(ext, c("-i" = pathname("foo.tar", must_exist = TRUE), "-o" = pathname("foo.tar.gz", must_not_exist = TRUE)))

That's all I have had time to scribble down for now. I'm sure there are some packages out there that may target parts of the above.

HenrikBengtsson avatar May 24 '17 22:05 HenrikBengtsson

I'd definitely like to see some collected wisdom on this. Beyond direct calls to system and system2, I think I've seen clever stuff by @richfitz refer to an internal function from @gaborcsardi callr for this (https://github.com/richfitz/drat.builder/blob/master/R/utils.R#L3) (which I can't seem to find in callr.

definitely would be interested to see an implementation along the lines you sketch out above.

cboettig avatar May 24 '17 23:05 cboettig

callr uses processx (https://github.com/r-pkgs/processx) now, which a lot of nice features (e.g. timeouts) for external processes, especially background processes.

gaborcsardi avatar May 24 '17 23:05 gaborcsardi