biokepi Support Guacamole

This requires a new kind of machine, that knows about Hadoop/Yarn.

I've made it work before, the question is how to abstract it to make it portable/configurable.

Edit: typo

Apr 24 '15 16:04 smondet

Are thinking of moving the Spark module into here or ketrew directly?

Apr 26 '15 20:04 arahuja

Hoping to breathe some life into this, I've been increasingly wanting this recently!

Specifically, I keep trying to repro external guacamole users' local runs, e.g. https://github.com/hammerlab/guacamole/issues/572, and I do it on login/dev nodes but saturating those for up to hours is not ideal, would love to spawn these out on the cluster.

Sep 14 '16 17:09 ryan-williams

I think @e5c is beginning work on this?

Sep 14 '16 17:09 ihodes

This requires a new kind of machine, that knows about Hadoop/Yarn.

This should now be handled by the requirements thing (implementations of Machine.t that do not know about spark/yarn could be improved by dying with a nice error message).

Sep 14 '16 17:09 smondet

@smondet

This should now be handled by the requirements thing

Mind pointing to a code example where spark/yarn is specified as a requirement for a machine?

Sep 14 '16 17:09 e5c

@e5c there is no code that requires spark so far, I just meant that the "infrastructure" is there to wire the things together.

There is a ``Spark [ ... ]requirement already there: https://github.com/hammerlab/biokepi/blob/master/src/run_environment/machine.ml#L128 So any tool can callMachine.run_program ... ~requirements:[ Spark ["foo"; "bar"] ] but it's unused so far.

If you hack on that and the string list argument is not the structure you find appropriate to "configure" spark/yarn usage , please change it, I just picked it semi-randomly :)

Sep 14 '16 18:09 smondet