poseidon
poseidon copied to clipboard
Installing and running through singularity
I understand how to install and run this program with docker but how can I install and run the software without using docker on a HPC cluster which uses singularity?
Hey @rresendepinto thx for your interest! This should be fairly simple: -profile slurm,singularity
in case your HPC runs SLURM.
Please check also the README! When running on a HPC, I also recommend using --cachedir
to store the Singularity images and --workdir
or -w
to stare the work directories. Can be dangerous on HPC to store such stuff in your home for example w/ limited space.
Thank you for the quick answer! What if the HPC doesnt run SLURM, but SGE instead?
Uh I think SGE is not implemented yet but only SLURM and LSF. However, SGE is supported by nextflow: https://www.nextflow.io/docs/latest/executor.html
We can provide a hot fix and then you can give it a try. Unfortunately, we can not test this bc/ no access to a SGE.
@fischer-hub can you add a -profile sge
to the nextflow.config please?
If you could do that, it would help me a lot. Thank you!
Hey @rresendepinto ! I just added the SGE
profile to the workflow on a new branch sge_profile! If you could checkout the branch and tell me if everything is running as you'd expect that would be great, since we don't have access to any machines running SGE
as @hoelzer said.
You can activate the SGE
profile by adding -profile sge,singularity
, -profile sge,conda
or -profile sge,docker
to your nextflow call depending on which container engine you would like to use!
Thx @fischer-hub for the fast fix!
@rresendepinto you can also pull and run the new code on the branch easily via
nextflow pull hoelzer/poseidon
nextflow run hoelzer/poseidon -r sge_profile ...
Hi! I tried the new version but I think there was a conflict with SGE.
Error executing process > 'check_fasta_format (1)'
Caused by:
Failed to submit process to grid scheduler for execution
Command executed:
qsub -terse .command.run
Command exit status:
1
Command output:
Unable to run job: denied: host "compute-0-12.local" is not a submit host
Exiting.
The system is CentOS 7 and the SGE version is 8.1.9.
Hi @rresendepinto , sadly I can not test this but from your command output I'm wondering wether you are executing the pipeline from a compute node:
Unable to run job: denied: host "compute-0-12.local" is not a submit host
In that case could you try running the pipeline from a node where the qsub
command is available? (e.g. the clusters head node)
The nextflow documentation mentions this for SGE
:
Nextflow manages each process as a separate grid job that is submitted to the cluster by using the qsub command.
Being so, the pipeline must be launched from a node where the qsub command is available, that is, in a common usage scenario, the cluster head node.
Nextflow will then submit the individual processes to the compute nodes itself :)
It just hangs indefinitely in the clusters head node. I can't figure out why.
Hi! Could you post the command you used and maybe the .nextflow.log file? At what point does the pipeline stop running?
Command:
NXF_JAVA_HOME=/home/rpinto/Documents/bin/jre1.8.0_321 NXF_VER=20.09.0-edge nextflow run hoelzer/poseidon -r sge_profile --fasta poseidon/test_data/bats_mx1_small.fasta --cores 4 -profile sge,singularity >> qsub2.out &
Nextflow log file nextflow_0703.log
This is the output:
Tree root species: NA Reference species: NA
Use KH-insignificant breakpoints: no
[- ] process > check_fasta_format - [- ] process > translatorx - [- ] process > check_aln -
[- ] process > check_fasta_format - [- ] process > translatorx - [- ] process > check_aln - [- ] process > remove_gaps - [- ] process > raxml_nt - [- ] process > raxml_aa - [- ] process > raxml2drawing - [- ] process > nw_display - [- ] process > barefoot - [- ] process > model_selection - [- ] process > gard_detect - ________________________ Execution status: failed Results are reported here: results/<prefix_of_your_fasta>/html/full_aln/index.html
No results folder is created
Thanks @rresendepinto ! Your command seems fine to me, works as expected (with slurm). However I noticed you are running Nextflow version 20.09.0-edge, you can update your Nextflow installation to version 21.10.6 with nextflow self-update
. Maybe that will already fix the issue. Otherwise I'm not quite sure whats going on, did you successfully run other nextflow pipelines in the past already? Also, is there anything interesting in the qsub.out
log files?
Btw, do you see any of the singularity images being downloaded?
This is the first nextflow pipeline I am trying to run on this cluster. The qsub file only presents what I reported before as output. The same thing happens with version 21.10.6. The singularity images are being downloaded
Alright, you could try to run the nextflow test pipeline, that way we can see if this issue is restricted to poseidon
or an issue with the HPC and the nextflow execution:
nextflow run hello
You can expect an output similar to this if everything is working correctly:
N E X T F L O W ~ version 21.10.6
Launching `nextflow-io/hello` [nice_payne] - revision: ec11eb0ec7 [master]
executor > local (4)
[fa/20fe7a] process > sayHello (1) [100%] 4 of 4 ✔
Ciao world!
Hello world!
Hola world!
Bonjour world!
Also I found this issue with nextflow on SGE HPCs which kind of sounds like your issue. The problem here was that SGE defaulted to a shell that was not bash, resulting in nextflow crashing. I added their recommended fix to the SGE
profile in poseidon
, so you could also try pulling the last commit again and running your command.
Do you know what kind of shell your HPC is running as default?
The hello program works. The HPC runs bash as default.
The pipeline doesn't hang anymore but throws a different error related to the job scheduler:
Okay, great! That means nextflow is running now.
From the qsub log file it seems your cluster requires the definition of the parallel environment to use. I added smp
as the standard parallel environment to use with the SGE profile here, however it is possible that your cluster is using a different parallel environment.
If you are still getting an error similar to
Command output:
Unable to run job: "job" denied: use parallel environments instead of requesting slots explicitly
Exiting.
that is referring to the parallel environment you might have to change the penv
variable here (in your local installation) from smp
to the environment your cluster is using, e.g. something like mpi.
But you can just try to run the new commit maybe your cluster is using smp
!
I think it uses mpi. Where is the pipeline code stored?
I changed to mpi but it won't run because it has uncommited changes.
I changed to mpi but it won't run because it has uncommited changes.
Maybe you are trying to run the pipeline from this branch here. Can you try to change to your local directory where your poseidon
installation is (I think in your log file it was /home/rpinto/.nextflow/assets/hoelzer/poseidon/
) and then start the pipeline with:
nextflow run poseidon.nf --fasta poseidon/test_data/bats_mx1_small.fasta --cores 4 -profile sge,singularity
nextflow run hoelzer/poseidon -r sge_profile ...
will try to run the sge_profile branch but you actually want to run with your loal changes.
Yes exactly, or first clone your own local copy of the PoSeiDon code, make the changes, and then run like described above via nextflow run poseidon.nf ...
git clone https://github.com/hoelzer/poseidon.git
cd poseidon
git checkout sge_profile
# now you are on the code branch w/ the changes David introduced
git pull origin sge_profile
# just in case, check that you really have the changes in this branch
# now modify your local copy according to your needs
nextflow run poseidon.nf --fasta poseidon/test_data/bats_mx1_small.fasta --cores 4 -profile sge,singularity
I ran the pipeline from the local directory and it seems to have solved the issues with the job scheduler. However, it threw an error on the "gard_detect" step. The output is in the following file. I am running the pipeline on the test_data/bats_mx1_small.fasta file.
Glad you got it to run, yes there are a few known issues regarding the gard_detect
process, there already is a PR addressing them, however it does not contain the SGE profile changes.
Could you provide the .command.log
and gard.log
files from the working directory so we can figure out if its a known issue?
(/home/rpinto/.nextflow/assets/hoelzer/poseidon/work/84/428977a49ec560ffb8bdddfe7eb1b1
)
If its the same issue I can just rebase the sge_profile
branch so the fix is on there too.
.command.log is empty
Yep, thats the issue targeted in the gard
PR. I rebased the sge_profile
branch so the change should be available there too now. Just pull and try again! And don't forget to set your penv
in case it got overwritten because of the pull :)
I can not pull the pipeline again.
hoelzer/poseidon contains uncommitted changes -- cannot pull from repository
I tried changing the penv variable back to 'smp' but it didn't work
Can you try:
git clone https://github.com/hoelzer/poseidon.git
cd poseidon
git checkout sge_profile
# now you are on the code branch w/ the changes David introduced
git pull origin sge_profile
# just in case, check that you really have the changes in this branch
# now modify your local copy according to your needs
nextflow run poseidon.nf --fasta test_data/bats_mx1_small.fasta --cores 4 -profile sge,singularity
that way you pull the changes directly via Git.
It finally worked! I can't wait to use this pipeline on my data.
Thank you so much for your help! :)
Great! I'm glad to hear that,
thanks for also testing the SGE
profile!
Awesome! Thanks for your patience @rresendepinto and thanks for the support @fischer-hub !