Al Chu issues

Results 94 issues of


                                            Al Chu

kvs-watch: should be aware if kvs module reloaded

I noticed a call to `flux mini submit --wait hostname` would hang when I wrote a regression test for #4331. After digging, it was determined the issue was due to...

job-exec/sdexec: implement sd-bus based IPC to implement equivalent to FLUX_EXEC_PROTOCOL_FD

With the job-exec/sdexec implementation (PR #4070) there is no equivalent way to add the barrier added by #4155 to solve a number of corner cases. At present, it seems impossible...

libflux: ability to debug if watcher started / stopped

A few times I've been hit by bugs / issues where it would be convenient to know if a watcher is presently started / stopped. So I end up setting...

libsubprocess: use `llog`

`libsubprocess` logs debugging to the flux broker, even in user contexts such as `flux exec`. Considering updating the internal logging to use `llog`. See initial discusion in #4060

rename master branch

once github picks a new default, change it to that

create better tensorflow example / default

the basic tfadd is stupid, need to learn tensorflow distributed well enough to do (perhaps) a distributed add based on number of nodes. Also it should exit when the job...

add node mapping to magpie output

for example: node-0 -> mycluster18 node 1 -> mycluster43 node 2 -> mycluster48 so users can map anonymous "node-X" to the actual cluster name more easily

spark - option to configure queue size

spark.scheduler.listenerbus.eventqueue.size defaults to 10000

enable spark.worker.cleanup by default

i.e. export SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=yes"

prefix magpie output with "MAGPIE" or similar

for example, when killing a script for running too long. Will help with grepping in large jobs outputs.