magpie
magpie copied to clipboard
support mechanism to allow magpie jobs & classic MPI jobs to run at the same time
currently job scripts won't work with this because of configurations like
#SBATCH --ntasks-per-node=1
In addition, options like
#SBATCH --no-kill
Present problems, as it makes sense for Hadoop/Spark but may not for MPI jobs (unless they can handle failure with SCR and similar libraries).
Need to determine mechanism to allow variable configuration of CPUs by user (X for Big Data, Y for MPI), then subsequent configuration for CPUs in Hadoop, Spark, etc.
Short term, ignore performance of MPI, but longer term binding to specific CPUs would be important for performance (i.e. MPI binds to CPUs A-B, Big Data gets CPUs C-D), unclear how to do this on Magpie side and Big Data side for the moment.
In addition, must determine mechanism by which user's can submit MPI job. Is launching a .sh script sufficient?
The simplest way I can see orchestrating the process is for the user to supply two scripts, one for the magpie job and one for the MPI job. Setting processor affinity could be done fairly easily using taskset(1). This would require knowing the number of cpus in advance, but might allow for a simple proof of concept.
By two scripts, do you mean the MPI job would run on one set of nodes (i.e. nodes[1-10]) and Magpie would run on another set of nodes (i.e. nodes[11-20])? I was initially imagining them running on the same nodes simultaneously.
No. More like you would have a single script something like:
taskset 0x00000003 spark-submit ...
taskset 0xFFFFFFFB mpirun -n10 code