Support Guacamole
This requires a new kind of machine, that knows about Hadoop/Yarn.
I've made it work before, the question is how to abstract it to make it portable/configurable.
Edit: typo
Are thinking of moving the Spark module into here or ketrew directly?
Hoping to breathe some life into this, I've been increasingly wanting this recently!
Specifically, I keep trying to repro external guacamole users' local runs, e.g. https://github.com/hammerlab/guacamole/issues/572, and I do it on login/dev nodes but saturating those for up to hours is not ideal, would love to spawn these out on the cluster.
I think @e5c is beginning work on this?
This requires a new kind of machine, that knows about Hadoop/Yarn.
This should now be handled by the requirements thing (implementations of Machine.t that do not know about spark/yarn could be improved by dying with a nice error message).
@smondet
This should now be handled by the requirements thing
Mind pointing to a code example where spark/yarn is specified as a requirement for a machine?
@e5c there is no code that requires spark so far, I just meant that the "infrastructure" is there to wire the things together.
There is a ``Spark [ ... ]requirement already there: https://github.com/hammerlab/biokepi/blob/master/src/run_environment/machine.ml#L128 So any tool can callMachine.run_program ... ~requirements:[ Spark ["foo"; "bar"] ]
but it's unused so far.
If you hack on that and the string list argument is not the structure you find appropriate to "configure" spark/yarn usage , please change it, I just picked it semi-randomly :)