magpie support mechanism to allow magpie jobs & classic MPI jobs to run at the same time

support mechanism to allow magpie jobs & classic MPI jobs to run at the same time

Open chu11 opened this issue 8 years ago • 3 comments

currently job scripts won't work with this because of configurations like

#SBATCH --ntasks-per-node=1

In addition, options like

#SBATCH --no-kill

Present problems, as it makes sense for Hadoop/Spark but may not for MPI jobs (unless they can handle failure with SCR and similar libraries).

Need to determine mechanism to allow variable configuration of CPUs by user (X for Big Data, Y for MPI), then subsequent configuration for CPUs in Hadoop, Spark, etc.

Short term, ignore performance of MPI, but longer term binding to specific CPUs would be important for performance (i.e. MPI binds to CPUs A-B, Big Data gets CPUs C-D), unclear how to do this on Magpie side and Big Data side for the moment.

In addition, must determine mechanism by which user's can submit MPI job. Is launching a .sh script sufficient?

Jun 24 '16 15:06 chu11

The simplest way I can see orchestrating the process is for the user to supply two scripts, one for the magpie job and one for the MPI job. Setting processor affinity could be done fairly easily using taskset(1). This would require knowing the number of cpus in advance, but might allow for a simple proof of concept.

Jun 24 '16 18:06 joshuata

By two scripts, do you mean the MPI job would run on one set of nodes (i.e. nodes[1-10]) and Magpie would run on another set of nodes (i.e. nodes[11-20])? I was initially imagining them running on the same nodes simultaneously.

Jun 24 '16 18:06 chu11

No. More like you would have a single script something like:

taskset 0x00000003 spark-submit ...
taskset 0xFFFFFFFB mpirun -n10 code

Jun 24 '16 21:06 joshuata

magpie magpie copied to clipboard

support mechanism to allow magpie jobs & classic MPI jobs to run at the same time

magpie
magpie copied to clipboard