easy question: new falcon wants to run locally instead of submitting to the cluster (?)
I must be doing something wrong. But I used the cfg provided. I also could not find anything particularly different about this cfg than that used in earlier versions of falcon that do submit to the cluster.
Any thoughts of what I'm doing wrong?
Thanks, David
I'm putting the cfg below:
[General]
use_tmpdir = True
# list of files of the initial bas.h5 files
input_fofn = input.fofn
input_type = raw
# The length cutoff used for seed reads used for initial mapping
#length_cutoff = 11400
# The length cutoff used for seed reads usef for pre-assembly
length_cutoff_pr = 1
genome_size = 1000000
seed_coverage = 20
sge_option_da = -pe serial 4 -l mfree=15G -q eichler-short.q -l h_rt=144:00:00 -m a -R y -soft -l gpfsstate=0
# this is for LAsort/LAmerge, the rp_ processes 60 was sufficient for
# most but making 90G for last few
sge_option_la = -q eichler-short.q -l h_rt=20:00:00 -pe serial 1 -l mfree=90G -m a -R y
sge_option_pda = -pe serial 4 -l mfree=9.5G -q eichler-short.q -l h_rt=20:00:00 -m a -R y
# this was -pe serial 16 but I don't see a need for 16 slots.
sge_option_pla = -pe serial 1 -l mfree=90G -q eichler-short.q -l h_rt=20:00:00 -m a -R y -l h="e217|e218|e219|e220|e221|e222|e223|e224|e225|e226|e227|e228|e229|e230|e231|e232|e233|e234|e235|e236|e237|e238|e240|e241|e242|e243|e244|e245|e246|e247"
sge_option_fc = -q eichler-short.q -l h_rt=20:00:00 -pe serial 16 -l mfree=10G -m a -R y
sge_option_cns = -pe serial 7 -l mfree=15G -q eichler-short.q -l h_rt=20:00:00 -m a -R y -l h="e227|e228|e229|e230|e231|e232|e233|e234|e235|e236|e237|e238|e240|e241|e242|e243|e244|e245"
pa_concurrent_jobs = 120
cns_concurrent_jobs = 110
ovlp_concurrent_jobs = 120
# from synth0 example:
pa_HPCdaligner_option = -v -B4 -t50 -h1 -e.99 -w1 -l1 -s1000
ovlp_HPCdaligner_option = -v -B4 -t50 -h1 -e.99 -l1 -s1000
#pa_DBsplit_option = -a -x5 -s.00065536
pa_DBsplit_option = -a -x5 -s.065536
#pa_DBsplit_option = -a -x5 -s1
ovlp_DBsplit_option = -a -x5 -s50
falcon_sense_option = --output_multi --min_idt 0.70 --min_cov 1 --max_n_read 20000 --n_core 0
#--min_cov_aln 1 --min_len_aln 40
overlap_filtering_setting = --max_diff 10000 --max_cov 100000 --min_cov 1 --min_len 1 --bestn 1000 --n_core 0
#dazcon = 1
Maybe the default changed.
job_type = sge
Hurray! That did it! Thanks, Chris!
Well, half a hurray...fc_run.py crashed leaving the prepare_database job running.
Why did fc_run.py crash? This is the info it gives:
Exception: Caused by:
Traceback (most recent call last):
File "/net/gs/vol1/home/dgordon/falcon/160915_unzip/FALCON-integrate/pypeFLOW/pypeflow/controller.py", line 523, in refreshTargets
rtn = self._refreshTargets(task2thread, objs = objs, callback = callback, updateFreq = updateFreq, exitOnFailure = exitOnFailure)
File "/net/gs/vol1/home/dgordon/falcon/160915_unzip/FALCON-integrate/pypeFLOW/pypeflow/controller.py", line 740, in _refreshTargets
raise TaskFailureError("Counted %d failure(s) with 0 successes so far." %failedJobCount)
TaskFailureError: 'Counted 1 failure(s) with 0 successes so far.'
I'm not sure where to look for errors anymore. This is the end of all.log:
2016-10-05 22:37:39,173 - pwatcher.fs_based - DEBUG - query(which='list', jobids=<0>)
2016-10-05 22:37:39,179 - pypeflow.pwatcher_bridge - DEBUG - In alive(), updated result of query:{'jobids': {}}
2016-10-05 22:37:39,179 - pypeflow.controller - WARNING - #tasks=1, #alive=0
2016-10-05 22:37:39,181 - pwatcher.fs_based - DEBUG - query(which='list', jobids=<0>)
2016-10-05 22:37:39,186 - pypeflow.pwatcher_bridge - DEBUG - In alive(), updated result of query:{'jobids': {}}
By the way, pypeflow has always reported time about 7 hours in the future. Is pypeflow tied to GMT or something? (grin)
By the way, pypeflow has always reported time about 7 hours in the future. Is pypeflow tied to GMT or something? (grin)
Yes, that's a requirement for PB.
Finding what went wrong can be a bit tricky. In the main log, you should see an ERROR somewhere. That will tell use the URL for the failed task. From that, you can usually guess the run-dir. And in the run-dir, look for pwatcher.dir/stderr. Suddenly, the problem will be obvious.
back working on debugging this...
you say "in the main log, you should see an ERROR somewhere." Which is the "main log"? all.log? there is no ERROR in that file. pypeflow.log? It says:
2016-10-05 21:30:21,424 - pypeflow.controller - ERROR - Any exception caught in RefreshTargets() indicates an unrecoverable error. Shutting down...
Traceback (most recent call last):
File "/net/gs/vol1/home/dgordon/falcon/160201/FALCON-integrate/fc_env/lib/python2.7/site-packages/pypeflow-0.1.1-py2.7.egg/pypeflow/controller.py", line 522, in refreshTargets
rtn = self._refreshTargets(task2thread, objs = objs, callback = callback, updateFreq = updateFreq, exitOnFailure = exitOnFailure)
File "/net/gs/vol1/home/dgordon/falcon/160201/FALCON-integrate/fc_env/lib/python2.7/site-packages/pypeflow-0.1.1-py2.7.egg/pypeflow/controller.py", line 684, in _refreshTargets
time.sleep(sleep_time)
KeyboardInterrupt
2016-10-05 21:30:21,438 - pypeflow.controller - WARNING - #tasks=1, #alive=1
2016-10-05 21:30:23,438 - pypeflow.controller - WARNING - Now, #tasks=1, #alive=1
2016-10-05 21:30:23,838 - pypeflow.task - DEBUG - task:///net/gs/vol1/home/dgordon/falcon/160201/FALCON-integrate/fc_env/lib/python2.7/site-packages/falcon_kit-0.4.0-py2.7-linux-x86_64.egg/falcon_kit/mains/run.py/task_build_rdb fails to generate all outputs
2016-10-05 21:30:23,853 - pypeflow.controller - WARNING - Now, #tasks=1, #alive=
Well, it's a DEBUG in this case, but it says we are missing outputs from task_build_rdb. You should see something in stderr for task_build_rdb.