Al Chu
Al Chu
I noticed a call to `flux mini submit --wait hostname` would hang when I wrote a regression test for #4331. After digging, it was determined the issue was due to...
With the job-exec/sdexec implementation (PR #4070) there is no equivalent way to add the barrier added by #4155 to solve a number of corner cases. At present, it seems impossible...
A few times I've been hit by bugs / issues where it would be convenient to know if a watcher is presently started / stopped. So I end up setting...
`libsubprocess` logs debugging to the flux broker, even in user contexts such as `flux exec`. Considering updating the internal logging to use `llog`. See initial discusion in #4060
once github picks a new default, change it to that
the basic tfadd is stupid, need to learn tensorflow distributed well enough to do (perhaps) a distributed add based on number of nodes. Also it should exit when the job...
for example: node-0 -> mycluster18 node 1 -> mycluster43 node 2 -> mycluster48 so users can map anonymous "node-X" to the actual cluster name more easily
spark.scheduler.listenerbus.eventqueue.size defaults to 10000
i.e. export SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=yes"
for example, when killing a script for running too long. Will help with grepping in large jobs outputs.