bpipe
bpipe copied to clipboard
bin/bpipe $LAUNCH_CMD method is extraordinarily slow for large $BPIPE_ARGS
In my continuous attempts to process as many files as possible with bpipe using https://github.com/CobraLab/minc-bpipe-library I have discovered a new limiting factor and a workaround.
Currently, the launcher wrapper for bpipe does this:
LAUNCH_CMD='
printf $$ > .bpipe.'$LAUNCHER_PID'.run.pid &
exec java -Xmx'${MAX_JAVA_MEM}' -Xss1M \
-noverify \
-classpath "'"$CP"'" '$BPIPEDEBUG' '$MODEFLAG' \
-Dbpipe.pid='$LAUNCHER_PID' \
-Dbpipe.home='"$JVM_BPIPE_HOME"' \
-Dbpipe.version='$VERSION' \
-Dbpipe.builddate='$BUILDDATE' \
org.codehaus.groovy.tools.GroovyStarter \
--classpath "'"$CP"'" \
--main bpipe.Runner '$TESTMODE' '$BPIPE_ARGS' > .bpipe/logs/$$.log 2>&1
'
$SHOWDEBUG && {
echo "LAUNCH_CMD: $LAUNCH_CMD"
}
nohup bash -c "$LAUNCH_CMD" \
> /dev/null 2>&1 &
When $BPIPE_ARGS is large (in my case, 3000+ input files with paths), bash sits and spins at 100% cpu for a long time (not benchmarked... took so long I gave up). I thought this was java barfing (as I had previously had issues), but it turns out java never get to exec.
I found this to immediately fix my problem:
execcommand=$(mktemp)
echo $execcommand
echo "$LAUNCH_CMD" > $execcommand
nohup bash $execcommand \
> /dev/null 2>&1 &
Here I dump the runcommand to a file and launch it rather than using a variable. I suspect the root cause of this is something about buffering and pipes, as I also tested echoing the LAUNCH_COMMAND into a bash stdin and had the same "takes forever" problems.
Very interesting - thanks for posting! I will try this out on some of our own large pipelines and merge after a little testing.
Incidentally, if you are running pipelines with many files, there are some other changes in 0.9.8 coming that will help a bit there as well.