looper icon indicating copy to clipboard operation
looper copied to clipboard

add support for bulker

Open nsheff opened this issue 6 years ago • 1 comments
trafficstars

adding support for bulker would be sweet. here's what we'd need to do:

  • [x] pipeline interface adds bulker_crate attribute
  • [ ] on --bulker flag, looper first runs a bulker pull {bulker_crate}
  • [ ] on --bulker flag, looper prepends bulker run {bulker_crate} to each command

I think that's it...pretty simple...

question: why is this superior to using the divvy bulker templates? well...this automatically works with whatever existing template without requiring a new template.

nsheff avatar Aug 06 '19 02:08 nsheff

right now this is still working with divvy; it is not integrated into looper

nsheff avatar Mar 05 '20 02:03 nsheff

Ok, I have this working on branch dev_add_bulker for a simple example.

Given command: looper run --bulker

with pipeline interface:

pipeline_name: count_lines
pipeline_type: sample
var_templates:
  bulker_command: pi
command_template: >
  {pipeline.var_templates.bulker_command}

bulker_crate: bulker/pi

Generates sub:

#!/bin/bash

echo 'Compute node:' `hostname`
echo 'Start time:' `date +'%Y-%m-%d %T'`

{
bulker run bulker/pi pi 
} | tee /home/drc/GITHUB/hello_looper/hello_looper/intermediate/results/submission/count_lines_frog_2.log

and generates terminal output:

## [2 of 2] sample: frog_2; pipeline: count_lines
Writing script to /home/drc/GITHUB/hello_looper/hello_looper/intermediate/results/submission/count_lines_frog_2.sub
Job script (n=1; 0.00Gb): /home/drc/GITHUB/hello_looper/hello_looper/intermediate/results/submission/count_lines_frog_2.sub
Compute node: databio
Start time: 2024-06-13 19:03:44
Using default config. You may specify in env var: ['BULKERCFG']
Activating crate: bulker/pi

3.141592653589793238462643383279502884197169399375105820974944592307816406286208998628034825342117067

Looper finished
Samples valid for job generation: 2 of 2

donaldcampbelljr avatar Jun 13 '24 23:06 donaldcampbelljr

For getting multiple containers to work in the same pipeline interface, one must structure the piface as so (and use the --bulker flag):

pipeline_name: count_lines
pipeline_type: sample
var_templates:
  bulker1: 'bulker run bulker/pi pi'
  bulker2: 'bulker run bulker/demo cowsay'
command_template: >
  {pipeline.var_templates.bulker1} | {pipeline.var_templates.bulker2}

bulker_crate:
  - bulker/pi
  - bulker/demo

This produces a result where both crates are working together:

## [2 of 2] sample: frog_2; pipeline: count_lines
Writing script to /home/drc/GITHUB/hello_looper/hello_looper/intermediate/results/submission/count_lines_frog_2.sub
Job script (n=1; 0.00Gb): /home/drc/GITHUB/hello_looper/hello_looper/intermediate/results/submission/count_lines_frog_2.sub
Compute node: databio
Start time: 2024-06-14 11:16:20
.Using default config. You may specify in env var: ['BULKERCFG']
Using default config. You may specify in env var: ['BULKERCFG']
Activating crate: bulker/pi

Activating crate: bulker/demo

 _________________________________________
/ 3.1415926535897932384626433832795028841 \
| 971693993751058209749445923078164062862 |
\ 08998628034825342117067                 /
 -----------------------------------------
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||

Looper finished

This creates the following submission script:

#!/bin/bash

echo 'Compute node:' `hostname`
echo 'Start time:' `date +'%Y-%m-%d %T'`

{
bulker run bulker/pi pi | bulker run bulker/demo cowsay 
} | tee /home/drc/GITHUB/hello_looper/hello_looper/intermediate/results/submission/count_lines_frog_1.log

The current Looper code loads the crate(s) into the current environment using the f and b flags for convenience. However, it is on the user to structure the bulker commands to use bulker run appropriately.

The is in contrast to the original divvy "workaround" (which still functions and works fine. I left it alone.) It creates a submission script by using a specific .sub template and also allows for chaining bulker crates:

#!/bin/bash

echo 'Compute node:' `hostname`
echo 'Start time:' `date +'%Y-%m-%d %T'`

eval "$(bulker activate -e bulker/demo,bulker/pi)"

{
  pi | cowsay 
} | tee /home/drc/GITHUB/hello_looper/hello_looper/intermediate/results/submission/count_lines_frog_1.log -i
pipeline_name: count_lines  
pipeline_type: sample  
command_template: >  
  pi | cowsay

donaldcampbelljr avatar Jun 14 '24 15:06 donaldcampbelljr

The current Looper code loads the crate(s) into the current environment using the f and b flags for convenience. However, it is on the user to structure the bulker commands to use bulker run appropriately.

what are the f and b flags ?

nsheff avatar Jun 14 '24 22:06 nsheff

it doesn't make sense to me that a user would write bulker run into the pipeline interface like this:

var_templates:
  bulker1: 'bulker run bulker/pi pi'
  bulker2: 'bulker run bulker/demo cowsay'

that's the point of this: that looper should handle that for you.

I also don't understand why you're putting this command into a var template like this:

var_templates:
  bulker_command: pi
command_template: >
  {pipeline.var_templates.bulker_command}

here, you're not saying bulker run ... (which is good). but anyway, why not just do:

command_template: >
  pi

I guess I'm not understanding why you'd ever reference bulker in these command_template or variables; the point is that this can be run either with --bulker or without. In both of these above cases you're tightly coupling, either in name or in function, the pipeline to bulker, but that's not good

nsheff avatar Jun 14 '24 22:06 nsheff

The current Looper code loads the crate(s) into the current environment using the f and b flags for convenience. However, it is on the user to structure the bulker commands to use bulker run appropriately.

what are the f and b flags ?

The f and b flags are for the bulker load command:

  -b, --build           Build/pull the actual containers, in addition to theexecutables. Default: False
  -f, --force           Force overwrite? Default: False

I guess I'm not understanding why you'd ever reference bulker in these command_template or variables; the point is that this can be run either with --bulker or without. In both of these above cases you're tightly coupling, either in name or in function, the pipeline to bulker, but that's not good.

The current method using the divvy template is able to activate the bulker crates via

eval "$(bulker activate -e bulker/demo,bulker/pi)"

before executing commands from the command template. That means that the environment executing the command would know about the pi command.

However, I had difficulties activating the crates using only the command template.

Therefore, I used your example above:

on --bulker flag, looper prepends bulker run {bulker_crate} to each command

This works. But how does Looper/the environment know which crate each command belongs to? They haven't been activated yet. So it seemed that the user would need to supply that information via bulker run bulker_crate bulker_command within the pipeline interface.

donaldcampbelljr avatar Jun 17 '24 13:06 donaldcampbelljr

We've decided to not pursue this enhancement and instead continue relying on divvy templates and setting compute variables, e.g. looper run --looper-config .looper.yaml --package bulker_local --compute BULKER_CRATE=bulker/demo

donaldcampbelljr avatar Jun 18 '24 21:06 donaldcampbelljr