Al Chu

Results 94 issues of Al Chu

I noticed a call to `flux mini submit --wait hostname` would hang when I wrote a regression test for #4331. After digging, it was determined the issue was due to...

With the job-exec/sdexec implementation (PR #4070) there is no equivalent way to add the barrier added by #4155 to solve a number of corner cases. At present, it seems impossible...

A few times I've been hit by bugs / issues where it would be convenient to know if a watcher is presently started / stopped. So I end up setting...

`libsubprocess` logs debugging to the flux broker, even in user contexts such as `flux exec`. Considering updating the internal logging to use `llog`. See initial discusion in #4060

once github picks a new default, change it to that

the basic tfadd is stupid, need to learn tensorflow distributed well enough to do (perhaps) a distributed add based on number of nodes. Also it should exit when the job...

for example: node-0 -> mycluster18 node 1 -> mycluster43 node 2 -> mycluster48 so users can map anonymous "node-X" to the actual cluster name more easily

spark.scheduler.listenerbus.eventqueue.size defaults to 10000

i.e. export SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=yes"

for example, when killing a script for running too long. Will help with grepping in large jobs outputs.