robot
robot copied to clipboard
Auditing ROBOT command chain
Currently, many ROBOT commands can be either run separately (and producing intermediate results), or chained in a single command (thus saving some space/time to (de)serialize the inputs/outputs). As an example, let's consider two options:
- Separate calls
robot query --input input.owl --update update.rq --output temp1.owl
robot reason --input temp1.owl --output output.owl
This option is suitable for debugging - it allows me to measure execution time, debug the individual pipeline steps (query/reason) and investigate intermediate results (temp1.owl).
- Single call
robot query --input input.owl --update update.rq reason --output output.owl
This option is suitable for production - it does not (de)serialize intermediate outputs, thus saving execution time and disk space.
Switching between these options is not very flexible. It would be beneficial to support option 2 with some global configuration options to make the commands auditable, similarly to the global logging configuration '-v, -vv, -vvv'. The most important switch for my case would be st. like
--store-intermediatethat would create intermediate--outputs of all the commands in the chain.
However, other goodies that would also help me time-to-time while debugging would be
--store-diffs- will computerobot diffbetween each input/output pair in the chin or evenrobot merge/unmergeto obtain a "machine-processable" diff.--stats- that would compute some basic auditing metadata (e.g. execution times) of the execution of each of the commands in the chain.
What would be your thoughts on these?
(This scratches the surface of even a more ambitious topic - orchestration of robot commands in some ETL-like tool. But it belongs to another ticket.)
@psiotwo for the store-intermediate idea, I think it's the case that you can already combine --output and chaining, so that each step both writes a file and pipes to the next command. Not sure if it works for all commands.
@balhoff thanks for hint - yes, this would work, although it seems a bit complicated to switch this debug on/off in a Makefile for a ROBOT command chain (can do just st. like $(if $(DEBUG),-o $debug-1.owl,) for each robot subcommand). But maybe it is just my low experience with Makefiles ...
Running with -v will produce subcommand timing with the logger:
WARN Subcommand Timing: convert took 0.175 seconds
To store logs in a file:
robot convert --input foo.ttl --output bar.owl -v 1> log.txt
The only item here that can't currently be done while chaining commands is producing the diff. I suppose you could put it all in one command:
robot query \
--input input.owl \
--update update.rq \
${DEBUG:+--output update-intermediate.owl} \
reason --output output.owl && \
[[ -n $DEBUG ]] && \
robot diff \
--left input.owl \
--right update-intermediate.owl \
--output update-diff.txt
This will only produce the intermediate output file and the diff if DEBUG is set.
Thanks @beckyjackson for the hint. Actually, w.r.t. stats I was more interested in some machine processable output (e.g. JSON), for some subsequent analytics.
To be able to track different outcomes, I am currently using the following pattern. The proposal for diff handling you suggest seems gtm!
DEBUG=true
robot <COMMAND-1> -i input.ttl ... $(if $(DEBUG),-o $output-1.owl,) \
<COMMAND-2> ... $(if $(DEBUG),-o $output-2.owl,) \
<COMMAND-3> ... $(if $(DEBUG),-o $output-3.owl,) \
...
<COMMAND-N> ... $(if $(DEBUG),-o $output-N.owl,) \
<COMMAND-N+1> ... -o output.owl