Slurm support
Would you mind terribly if I re-organized the cluster job submission a bit and added Slurm support?
The basic interaction of the code would be mostly the same, with the addition of a "cluster_type" that could default to "sge":
cluster_type = 'sge' # or 'slurm' or 'lsf'
cluster_utils.run_on_cluster(cmd_to_run,
job_name,
self.output_dir,
queue_type=queue_type,
settings_fname=self.settings_fname,
cluster_type=cluster_type)
cluster_utils.wait_on_jobs(cluster_jobs, cluster_type)
In the cluster_utils, a factory would pull up the correct cluster engine and do the cluster-specific thing
clusterEngine = getClusterEngine(cluster_type)
clusterEngine.wait_on_job(jobid)
Does that sound reasonable? ajk
There's a simpler way to do this. All you'd need is a wrapper script like send_job that takes a shell script as input and submits it to the cluster, passing it whatever parameters are needed. Then you just register send_job as the cluster submission script in the MISO settings file.
It doesn't make sense for us to support every cluster system and its quirks, especially since if I incorporate this code, I'll have no way of testing it (I have no access to Slurm clusters.)
Hi,
I heard from one of our users that they obtained an update from the MISO project that allowed them to run MISO on our cluster under SLURM, but I don't see any changes in the MISO github repository that I could apply to the copy we have installed for all our users. Can you point me in the right direction?
Regards,
Alex
Hi, Alex.
Feel free to use our fork: https://github.com/harvardinformatics/MISO https://github.com/harvardinformatics/MISO
The pull request wasn't accepted (with good reason), but I haven't been able to go back and make it more palatable. The fork should work as is.
Aaron Kitzmiller [email protected]
On Aug 15, 2016, at 11:02 AM, Oleksandr Moskalenko [email protected] wrote:
Hi,
I heard from one of our users that they obtained an update from the MISO project that allowed them to run MISO on our cluster under SLURM, but I don't see any changes in the MISO github repository that I could apply to the copy we have installed for all our users. Can you point me in the right direction?
Regards,
Alex
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/yarden/MISO/issues/86#issuecomment-239825547, or mute the thread https://github.com/notifications/unsubscribe-auth/AAgBN8xm_VzsGbhGV7x-mNWZxoSIC4Zdks5qgH-cgaJpZM4I2sV-.
Hello, I am having trouble with SLURM based cluster implementation of MISO - could you elaborate on using a shell script as a wrapper for this purpose ? We already use a shell script to submit jobs to the cluster and changing the setttings file did not help. thanks Manoj
First, what exactly is the trouble that you're having? What kinds of errors do you get?
In the Harvard Informatics fork, setting the cluster command is just a key for selecting a python class that handles batch submission and job checking etc. 'sbatch' is really just a key for the SlurmClusterEngine.
I'm not sure what your shell script for submitting jobs actually does, but it's possible that a clever tweak of the slurm_template.txt file would do it.
Aaron K.
On Nov 29, 2016, at 12:08 PM, pillailab [email protected] wrote:
Hello, I am having trouble with SLURM based cluster implementation of MISO - could you elaborate on using a shell script as a wrapper for this purpose ? We already use a shell script to submit jobs to the cluster and changing the setttings file did not help. thanks Manoj
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/yarden/MISO/issues/86#issuecomment-263632926, or mute the thread https://github.com/notifications/unsubscribe-auth/AAgBN0zaKE0FfwEHf7TbIy3mrTV79e62ks5rDFv_gaJpZM4I2sV-.
Thanks Aaron, and I should have elaborated - the sbatch script runs fine on the cluster without cluster options. this is the bash script I run MISO on the cluster ( with slurm) #!/bin/bash #SBATCH -p general #SBATCH -J my_job #SBATCH -c 20 --mem-per-cpu=6000 #SBATCH -t 50:00:00 #SBATCH --mail-type=ALL #SBATCH --mail-user=email gff=~/project/GenomeBuilds/Human/GRCh38/gff3/indexed/ bam=~/project/data/TCGA_BRCA/BAM/696c2c1b-4216-45e1-b7bb-b2c20fd04fe8/1ff498b2-f0c9-499b-93a5-18f040237ba7_gdc_realn_rehead.bam outdir=~/project/data/TCGA_BRCA/BAM/696c2c1b-4216-45e1-b7bb-b2c20fd04fe8/ miso --run $gff $bam --use-cluster --output-dir=$outdir --read-len=50 --chunk-jobs=1000 --settings-filename=/ysm-gpfs/home/mp758/.local/lib/python2.7/site-packages/misopy/settings/miso_settings.txt --no-wait
I added the --chunkjobs and --setting-filename and --no-wait options per different suggestions. When i run this, the job exits without distributing the jobs. If I dont give the --no-wait option, it will give an error that states that "could not parse jobID". thanks for your help Manoj
So this script looks to be the one that launches the main MISO process. If your settings file includes the cluster command "sbatch", then further jobs will be dispatched by MISO in separate job submissions. Those sub-jobs will be governed by the slurm_template.txt contents. Can you attach your settings file and the slurm_template that you're using?
ajk
On Nov 29, 2016, at 1:18 PM, pillailab [email protected] wrote:
Thanks Aaorn, and I should have elaborated - the sbatch script runs fine on the cluster without cluster options. this is the bash script I run MISO on the cluster ( with slurm) #!/bin/bash #SBATCH -p general #SBATCH -J my_job #SBATCH -c 20 --mem-per-cpu=6000 #SBATCH -t 50:00:00 #SBATCH --mail-type=ALL #SBATCH --mail-user=email gff=/project/GenomeBuilds/Human/GRCh38/gff3/indexed/ bam=/project/data/TCGA_BRCA/BAM/696c2c1b-4216-45e1-b7bb-b2c20fd04fe8/1ff498b2-f0c9-499b-93a5-18f040237ba7_gdc_realn_rehead.bam outdir=~/project/data/TCGA_BRCA/BAM/696c2c1b-4216-45e1-b7bb-b2c20fd04fe8/ miso --run $gff $bam --use-cluster --output-dir=$outdir --read-len=50 --chunk-jobs=1000 --settings-filename=/ysm-gpfs/home/mp758/.local/lib/python2.7/site-packages/misopy/settings/miso_settings.txt --no-wait
I added the --chunkjobs and --setting-filename and --no-wait options per different suggestions. When i run this, the job exits are distributing the jobs. If I dont give the --no-wait option, it will give an error that states that "could not parse jobID" thanks for your help Manoj
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/yarden/MISO/issues/86#issuecomment-263652628, or mute the thread https://github.com/notifications/unsubscribe-auth/AAgBN5qJifcKeAXMqlzKlxnILhwd56dDks5rDGxegaJpZM4I2sV-.
I apologize for being a bit dense about this - I dont have a separate slurm_template.txt file. Should I have a separate bash script for that and then launch it ?
This is the output of the miso_settings.txt file [data] filter_results = True min_event_reads = 20
[cluster] cluster_command = sbatch --mem=10g --time=50:00:00
[sampler] burn_in = 500 lag = 10 num_iters = 5000 num_chains = 6 num_processors = 8
thanks Manoj
So, if you're using our fork, then the slurm_template.txt file must be set.
The main MISO process, the command that you show below, can issue a number of sub jobs if cluster use is enabled. I believe those sub jobs process chunks of genes. If you have 10,000 genes and use chunkjobs=1000, then it will launch 10 sub jobs. Below is your script for the main process- MISO must be told how to launch the sub jobs.
If you set the cluster_command to "sbatch", then the SlurmClusterEngine is used to submit those individual jobs. The additional text that you include ( --mem=10g --time=50:00:00) is not useful. The cluster_command value just determines that Slurm will be used.
For those individual sub job submissions, you have to set the values for partitions, memory allocations, etc. using the slurm_template.txt file.
Under [cluster] you should have a slurm_template entry that points to the sbatch submission template that will be used for the sub jobs, e.g.
[cluster] cluster_command = sbatch slurm_template = /ysm-gpfs/home/mp758/slurm_template.txt
The template that comes with it was useful for our environment. The only thing you need to leave in place is the {cmd} part where the command will be substituted.
#!/bin/bash
#SBATCH -p serial_requeue #SBATCH --mem 4000 #SBATCH -t 0-1:00 #SBATCH -n 1 #SBATCH -N 1
source new-modules.sh module load python/2.7.6-fasrc01
{cmd}
It's not surprising that this is not obvious. It's not really documented :)
ajk
On Nov 29, 2016, at 1:42 PM, pillailab [email protected] wrote:
I apologize for being a bit dense about this - I dont have a separate slurm_template.txt file. Should I have a separate bash script for that and then launch it ?
This is the output of the miso_settings.txt file [data] filter_results = True min_event_reads = 20
[cluster] cluster_command = sbatch --mem=10g --time=50:00:00
[sampler] burn_in = 500 lag = 10 num_iters = 5000 num_chains = 6 num_processors = 8
thanks Manoj
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/yarden/MISO/issues/86#issuecomment-263659434, or mute the thread https://github.com/notifications/unsubscribe-auth/AAgBN7YS1R2S5jv0LhclqB5GFeM8Qyaqks5rDHIqgaJpZM4I2sV-.
Thank you for your patience. As i Understand it, the slurm_template.txt will provide settings for each sub job to be executed. I created a slurm_template.txt as below and pointed towards it in the miso_settings.txt file as you suggested. Contents below: ( miso is not installed as a module but can be called as command line)
#!/bin/bash #SBATCH -p serial_requeue #SBATCH --mem 4000 #SBATCH -t 0-1:00 #SBATCH -n 1 #SBATCH -N 1
module load python {cmd}
After i submitted the MISO.sh through sbatch, it ran for a few minutes and the output files had several lines which showed that the sub jobs were created, but exited without completing it.
Submitting job: gene_psi_batch_1 Using MISO settings file: /ysm-gpfs/home/mp758/.local/lib/python2.7/site-packages/misopy/settings/miso_settings.txt
- queue type: long
- queue name unspecified Executing: sbatch --mem=10g --time=50:00:00 -o "/ysm-gpfs/home/mp758/project/data/TCGA_BRCA/BAM/696c2c1b-4216-45e1-b7bb-b2c20fd04fe8/scripts_output" -e "/ysm-gpfs/home/mp758/project/data/TCGA_BRCA/BAM/696c2c1b-4216-45e1-b7bb-b2c20fd04fe8/scripts_output" "/ysm-gpfs/home/mp758/project/data/TCGA_BRCA/BAM/696c2c1b-4216-45e1-b7bb-b2c20fd04fe8/cluster_scripts/gene_p si_batch_1_time_11-29-16_14:24:00.sh"
This was similar when I did not provide a slurm_template.txt file. I am guessing the job did not use that file to specifiy the sub jobs. Do I need to make any changes to the mispy settings for that to happen ? thanks again manoj
You want to make sure that the contents of the file make sense for your organization. For example, if "module load python" does not make sense for you, then don't include it. If you don't have a partition called "serial_requeue" then it won't work either.
ajk
On Nov 29, 2016, at 2:35 PM, pillailab [email protected] wrote:
Thank you for your patience. As i Understand it, the slurm_template.txt will provide settings for each sub job to be executed. I created a slurm_template.txt as below and pointed towards it in the miso_settings.txt file as you suggested. Contents below: ( miso is not installed as a module but can be called as command line)
#!/bin/bash #SBATCH -p serial_requeue #SBATCH --mem 4000 #SBATCH -t 0-1:00 #SBATCH -n 1 #SBATCH -N 1
module load python {cmd}
After i submitted the MISO.sh through sbatch, it ran for a few minutes and the output files had several lines which showed that the sub jobs were created, but exited without completing it.
Submitting job: gene_psi_batch_1 Using MISO settings file: /ysm-gpfs/home/mp758/.local/lib/python2.7/site-packages/misopy/settings/miso_settings.txt
queue type: long queue name unspecified Executing: sbatch --mem=10g --time=50:00:00 -o "/ysm-gpfs/home/mp758/project/data/TCGA_BRCA/BAM/696c2c1b-4216-45e1-b7bb-b2c20fd04fe8/scripts_output" -e "/ysm-gpfs/home/mp758/project/data/TCGA_BRCA/BAM/696c2c1b-4216-45e1-b7bb-b2c20fd04fe8/scripts_output" "/ysm-gpfs/home/mp758/project/data/TCGA_BRCA/BAM/696c2c1b-4216-45e1-b7bb-b2c20fd04fe8/cluster_scripts/gene_p si_batch_1_time_11-29-16_14:24:00.sh" This was similar when I did not provide a slurm_template.txt file. I am guessing the job did not use that file to specifiy the sub jobs. Do I need to make any changes to the mispy settings for that to happen ? thanks again manoj
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/yarden/MISO/issues/86#issuecomment-263674619, or mute the thread https://github.com/notifications/unsubscribe-auth/AAgBN9Usmfoi13LnQJiZXmeetoazKxAyks5rDH6agaJpZM4I2sV-.
I may have misunderstood some of the instructions - this will work for the subfork you created right ? I was trying to trouble-shoot per Yarden's original post to just modify the batch submission without using a subfork. I will try a separate installation of your fork Thanks manoj
Ah. I see.
The problem with simply trying to change the cluster_command is that there are a number of other commands that are specific to each cluster type. Slurm uses sbatch to submit, but it uses squeue and / or sacct to check on progress. You can bury a bunch of if statements in the code if you want, but it's pretty hairy.
ajk
On Nov 29, 2016, at 3:20 PM, pillailab [email protected] wrote:
I may have misunderstood some of the instructions - this will work for the subfork you created right ? I was trying to trouble-shoot per Yarden's original post to just modify the batch submission without using a subfork. I will try a separate installation of your fork Thanks manoj
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/yarden/MISO/issues/86#issuecomment-263686753, or mute the thread https://github.com/notifications/unsubscribe-auth/AAgBNzB3LNId8A-sLp2_dOgkQ00Bf5sIks5rDIkDgaJpZM4I2sV-.
It did give me an error for serial_requeue - can I change that some where in the script ? Do I need a partition ?
Change it in the slurm_template. That's the reason that it's a template, so that you can change it to fit your environment. If your administrators have setup a default partition, then you can probably remove the line.
ajk
On Nov 29, 2016, at 3:54 PM, pillailab [email protected] wrote:
It did give me an error for serial_requeue - can I change that some where in the script ? Do I need a partition ?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/yarden/MISO/issues/86#issuecomment-263695531, or mute the thread https://github.com/notifications/unsubscribe-auth/AAgBN_mAta1GL65bah8F12WYo77sfUysks5rDJDqgaJpZM4I2sV-.
That seems to have worked - thank you so much. Will let you know if something arises. Appreciate all your efforts to keep these tools accessible even to novices like me. Manoj
Great. Glad it's working.
ajk
On Nov 29, 2016, at 4:11 PM, pillailab [email protected] wrote:
That seems to have worked - thank you so much. Will let you know if something arises. Appreciate all your efforts to keep these tools accessible even to novices like me. Manoj
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/yarden/MISO/issues/86#issuecomment-263700133, or mute the thread https://github.com/notifications/unsubscribe-auth/AAgBN20jSf7f4RUHaU5QbzQcOdGwrOTGks5rDJUTgaJpZM4I2sV-.
Hmm. I'll look in to it.
ajk
On Dec 14, 2017, at 3:05 PM, jianxinwang [email protected] wrote:
Hi aaronk, I'm trying to use MISO on our slurm cluster too. I checked out your copy of MISO as suggested and made changed to the miso_settings.txt file as show below: [data] filter_results = True min_event_reads = 20
[cluster] cluster_command = sbatch slurm_template = /util/common/bioinformatics/MISO/MISO/misopy/cluster/slurm_template.txt
[sampler] burn_in = 500 lag = 10 num_iters = 5000 num_chains = 6 num_processors = 4
And here is the content of "/util/common/bioinformatics/MISO/MISO/misopy/cluster/slurm_template.txt" #!/bin/bash
#SBATCH -p debug #SBATCH --mem 4000 #SBATCH -t 1:00:00 #SBATCH -n 1 #SBATCH -N 1
#source new-modules.sh module load python/anaconda
{cmd}
Below is the command and output from the terminal: python /util/common/bioinformatics/MISO/MISO/misopy/miso.py --run /util/ccr/data/MISO/hg19/indexed_SE_hg19_events/ /gpfs/projects/academic/big2/dbGaP/MM/RNA-Seq/BAM/hg19/MMRF_1300_T.sorted.bam --output-dir output_miso_SE/ --read-len 83 --paired-end 252 86 --prefilter --use-cluster Traceback (most recent call last): File "/util/common/bioinformatics/MISO/MISO/misopy/miso.py", line 25, in from misopy.cluster import getClusterEngine ImportError: No module named cluster
What am I missing here?
Thanks, Jason
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/yarden/MISO/issues/86#issuecomment-351821101, or mute the thread https://github.com/notifications/unsubscribe-auth/AAgBN5DVe0AaQ42DoFXGKUXytmNwXickks5tAX-EgaJpZM4I2sV-.
Hi aaronk,
Thanks for your response. I have got this problem solved. That is for a simple reason: I forgot to go over the installation steps and used the python script directly.
Jason
On Fri, Dec 15, 2017 at 9:51 AM, Aaron Kitzmiller [email protected] wrote:
Hmm. I'll look in to it.
ajk
On Dec 14, 2017, at 3:05 PM, jianxinwang [email protected] wrote:
Hi aaronk, I'm trying to use MISO on our slurm cluster too. I checked out your copy of MISO as suggested and made changed to the miso_settings.txt file as show below: [data] filter_results = True min_event_reads = 20
[cluster] cluster_command = sbatch slurm_template = /util/common/bioinformatics/MISO/MISO/misopy/cluster/ slurm_template.txt
[sampler] burn_in = 500 lag = 10 num_iters = 5000 num_chains = 6 num_processors = 4
And here is the content of "/util/common/bioinformatics/ MISO/MISO/misopy/cluster/slurm_template.txt" #!/bin/bash
#SBATCH -p debug #SBATCH --mem 4000 #SBATCH -t 1:00:00 #SBATCH -n 1 #SBATCH -N 1
#source new-modules.sh module load python/anaconda
{cmd}
Below is the command and output from the terminal: python /util/common/bioinformatics/MISO/MISO/misopy/miso.py --run /util/ccr/data/MISO/hg19/indexed_SE_hg19_events/ /gpfs/projects/academic/big2/dbGaP/MM/RNA-Seq/BAM/hg19/MMRF_1300_T.sorted.bam --output-dir output_miso_SE/ --read-len 83 --paired-end 252 86 --prefilter --use-cluster Traceback (most recent call last): File "/util/common/bioinformatics/MISO/MISO/misopy/miso.py", line 25, in from misopy.cluster import getClusterEngine ImportError: No module named cluster
What am I missing here?
Thanks, Jason
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/yarden/MISO/issues/86#issuecomment-351821101>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ AAgBN5DVe0AaQ42DoFXGKUXytmNwXickks5tAX-EgaJpZM4I2sV->.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/yarden/MISO/issues/86#issuecomment-352024348, or mute the thread https://github.com/notifications/unsubscribe-auth/AK_k2xrb1BKijyvS2r3HKEh7AGtrtGD9ks5tAodkgaJpZM4I2sV- .
-- In God we trust, all others must bring data. --William Edwards Deming (1900-1993)
Ha! Great!
ajk
On Dec 15, 2017, at 12:59 PM, jianxinwang [email protected] wrote:
Hi aaronk,
Thanks for your response. I have got this problem solved. That is for a simple reason: I forgot to go over the installation steps and used the python script directly.
Jason
On Fri, Dec 15, 2017 at 9:51 AM, Aaron Kitzmiller [email protected] wrote:
Hmm. I'll look in to it.
ajk
On Dec 14, 2017, at 3:05 PM, jianxinwang [email protected] wrote:
Hi aaronk, I'm trying to use MISO on our slurm cluster too. I checked out your copy of MISO as suggested and made changed to the miso_settings.txt file as show below: [data] filter_results = True min_event_reads = 20
[cluster] cluster_command = sbatch slurm_template = /util/common/bioinformatics/MISO/MISO/misopy/cluster/ slurm_template.txt
[sampler] burn_in = 500 lag = 10 num_iters = 5000 num_chains = 6 num_processors = 4
And here is the content of "/util/common/bioinformatics/ MISO/MISO/misopy/cluster/slurm_template.txt" #!/bin/bash
#SBATCH -p debug #SBATCH --mem 4000 #SBATCH -t 1:00:00 #SBATCH -n 1 #SBATCH -N 1
#source new-modules.sh module load python/anaconda
{cmd}
Below is the command and output from the terminal: python /util/common/bioinformatics/MISO/MISO/misopy/miso.py --run /util/ccr/data/MISO/hg19/indexed_SE_hg19_events/ /gpfs/projects/academic/big2/dbGaP/MM/RNA-Seq/BAM/hg19/MMRF_1300_T.sorted.bam --output-dir output_miso_SE/ --read-len 83 --paired-end 252 86 --prefilter --use-cluster Traceback (most recent call last): File "/util/common/bioinformatics/MISO/MISO/misopy/miso.py", line 25, in from misopy.cluster import getClusterEngine ImportError: No module named cluster
What am I missing here?
Thanks, Jason
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/yarden/MISO/issues/86#issuecomment-351821101>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ AAgBN5DVe0AaQ42DoFXGKUXytmNwXickks5tAX-EgaJpZM4I2sV->.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/yarden/MISO/issues/86#issuecomment-352024348, or mute the thread https://github.com/notifications/unsubscribe-auth/AK_k2xrb1BKijyvS2r3HKEh7AGtrtGD9ks5tAodkgaJpZM4I2sV- .
-- In God we trust, all others must bring data. --William Edwards Deming (1900-1993) — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/yarden/MISO/issues/86#issuecomment-352071500, or mute the thread https://github.com/notifications/unsubscribe-auth/AAgBN6J-NAc3rie4X5NT7tnbBPKNNgJOks5tArOdgaJpZM4I2sV-.