clusterflow
clusterflow copied to clipboard
Wall time
I currently have the @max_time
set to 23:59:00
in my clusterflow.config file and I have changed cr_download.cfmod.pl to make the time equal to num_files * 60
because my cluster is not on a dial up connection and I also changed sra_fqdump.cfmod.pl to be num_files * 60
but everytime I run the test command provided cf --genome GRCh37 sra_bowtie2 ftp://ftp.ddbj.nig.ac.jp/ddbj_database/dra/sralite/ByExp/litesra/SRX/SRX031/SRX031398/SRR1068378/SRR1068378.sra
the first job it creates is asking for 2 full days of time which forces the job onto a partition called long which has much less resources than the main partition. I can't seem to figure out why after changing the cf_download module the time remained the same as if I had not changed the script. Also 2 days is longer than the max time I set in my config and I don't know why that is allowed.
Hi @fjames003,
That was a bit of a mouthful! 😀 Let me see if I can break this down a little to confirm that I've understood you:
-
You set
@max_time
to23:50:00
in your config. This is the default: https://github.com/ewels/ClusterFlow/blob/c643e04cfa89b7cfb1f6f314f4ab22b92ade7b7b/clusterflow.config.example#L26 -
You modified these lines to make the requested time shorter: https://github.com/ewels/ClusterFlow/blob/c643e04cfa89b7cfb1f6f314f4ab22b92ade7b7b/modules/cf_download.cfmod.pl#L39-L40
-
You also modified these lines to make the requested time shorter: https://github.com/ewels/ClusterFlow/blob/c643e04cfa89b7cfb1f6f314f4ab22b92ade7b7b/modules/sra_fqdump.cfmod.pl#L38-L39
-
Despite this, when running the
sra_bowtie2
pipeline, Cluster Flow submits a job with a time limit of two days.
So, two issues here - firstly, @max_time
isn't limiting the job time as expected, secondly the job requests are longer than expected.
I'd suggest debugging the first problem as follows:
-
Is your config file definitely being found and parsed? You can try putting a print statement in here to ensure that the line is being picked up: https://github.com/ewels/ClusterFlow/blob/c643e04cfa89b7cfb1f6f314f4ab22b92ade7b7b/source/CF/Constants.pm#L148
-
@max_time
should set the internal variable$JOB_TIMELIMIT
which is then used here to cap all job submission requests: https://github.com/ewels/ClusterFlow/blob/c643e04cfa89b7cfb1f6f314f4ab22b92ade7b7b/cf#L1025-L1030 You can try putting print statements in here too, to see what the evaluated variables are turning out as to try to pin down why this is being let through.
The second problem is pretty strange too. Again I would start by putting in print statements to ensure that the code you changed is definitely being executed. You can also just put static strings in the section that you edited, which is less vulnerable to strange stuff.
I hope this makes sense, let me know how you get on. Sorry that I don't have any simple fixes for you - it's nigh on impossible to debug these kinds of problems remotely, and "it works fine for me" ™
Cheers,
Phil
Hi Phil,
Thank you for getting back to me. From what you have written it appears you understood my post, thank you for parsing it out as I should have done in the post to begin with.
My first question I have now is, are the @max_time
variables a different format than SLURM would use? I ask because you correctly stated that I set @max_time
to 23:50:00
in my config file, however you then say this is the default but the line that you reference is @max_time 10-00
which in SLURM time would be 10 days, 0 hours. I am not sure how 10-00
is equivalent to 23:50:00
.
Moving on to your suggestions, I placed a print in Constants.pm
and it is correctly setting $JOB_TIMELIMIT
to 23:50:00
, however placing a print in cf
after line 1029 to print the $time
variable never occurs which would tell me that the script never sees a $time
that is greater than $JOB_TIMELIMIT
even though the very first job is still being set to 2-00:00:00
. So maybe you are right in that @max_time
is not working the way I would expect, and in that case is there a way to make sure a Job never gets a time limit that is more than 23 hours and 50 minutes?
Thank you again for getting back to me on this, Frankie
Hi Frankie,
Wow that was a long delay before my reply, my apologies. I was doing some inbox archaeology and found your mail, so here goes.
I am not sure how
10-00
is equivalent to23:50:00
.
It's not - sorry, I was just linking to the existing code where the default is 10 days. So if you have set your config file to have @max_time 23:50:00
then yes, that should be under 24 hours for slurm.
however placing a print in cf after line 1029 to print the
$time
variable never occurs
Ok, so mean that this if block is not being executed? I don't have any quick fixes for you sorry, nothing to do but to get down and dirty with the code and start picking apart each bit of that if
statement and working back through the code to see where the logic is failing...
Sorry, not a very satisfactory answer I know. Basically as far as I can see, you're doing everything correctly and it should be working. I can't replicate the error, so I can't do the bug hunting for you, so you're kind of on your own.. 😞
Assuming that you didn't already give up, good luck and let me know how you get on / shout if there's anything I can help with!
Phil
ps. If you're only using Cluster Flow to download and dump SRA files, I'd recommend giving my newly updated SRA-explorer tool a spin: https://ewels.github.io/sra-explorer/
It now has direct links to FastQ files, courtesy of the ENA, and has copy+paste commands to use Aspera for super speedy downloads.
Phil