magpie
magpie copied to clipboard
Magpie contains a number of scripts for running Big Data software in HPC environments, including Hadoop and Spark. There is support for Lustre, Slurm, Moab, Torque. LSF, Flux, and more.
once github picks a new default, change it to that
the basic tfadd is stupid, need to learn tensorflow distributed well enough to do (perhaps) a distributed add based on number of nodes. Also it should exit when the job...
for example: node-0 -> mycluster18 node 1 -> mycluster43 node 2 -> mycluster48 so users can map anonymous "node-X" to the actual cluster name more easily
spark.scheduler.listenerbus.eventqueue.size defaults to 10000
i.e. export SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=yes"
for example, when killing a script for running too long. Will help with grepping in large jobs outputs.
For a variety of advanced scenarios, there have been requests to setup hadoop/spark/etc. with a command line tool. In addition, shutdown would be the user's responsibility via the command line...
Hi, After some days trying to start using magpie i don't know what to do. I'm trying to use the basic terasort example but when i execute the job in...
I have configured rdma hadoop and spark by myself in an InfiniBand cluster and it works, but when I try to use the submission script magpie.sbatch-srun-spark-with-yarn-and-hdfs (just for testing hadoop...
can be confusing, user may think yes means "I can only run things one time".