FALCON icon indicating copy to clipboard operation
FALCON copied to clipboard

easy question: new falcon wants to run locally instead of submitting to the cluster (?)

Open dgordon562 opened this issue 9 years ago • 5 comments

I must be doing something wrong. But I used the cfg provided. I also could not find anything particularly different about this cfg than that used in earlier versions of falcon that do submit to the cluster.

Any thoughts of what I'm doing wrong?

Thanks, David

I'm putting the cfg below:

[General]
use_tmpdir = True

# list of files of the initial bas.h5 files
input_fofn = input.fofn

input_type = raw


# The length cutoff used for seed reads used for initial mapping
#length_cutoff = 11400

# The length cutoff used for seed reads usef for pre-assembly
length_cutoff_pr = 1
genome_size = 1000000
seed_coverage = 20


sge_option_da = -pe serial 4 -l mfree=15G -q eichler-short.q -l h_rt=144:00:00 -m a -R y -soft -l gpfsstate=0

# this is for LAsort/LAmerge, the rp_ processes 60 was sufficient for 
#  most but making 90G for last few
sge_option_la = -q eichler-short.q -l h_rt=20:00:00 -pe serial 1 -l mfree=90G -m a -R y


sge_option_pda = -pe serial 4 -l mfree=9.5G -q eichler-short.q -l h_rt=20:00:00 -m a -R y

# this was -pe serial 16 but I don't see a need for 16 slots.  
sge_option_pla = -pe serial 1 -l mfree=90G -q eichler-short.q -l h_rt=20:00:00 -m a -R y -l h="e217|e218|e219|e220|e221|e222|e223|e224|e225|e226|e227|e228|e229|e230|e231|e232|e233|e234|e235|e236|e237|e238|e240|e241|e242|e243|e244|e245|e246|e247"


sge_option_fc = -q eichler-short.q -l h_rt=20:00:00 -pe serial 16 -l mfree=10G  -m a -R y

sge_option_cns = -pe serial 7 -l mfree=15G -q eichler-short.q -l h_rt=20:00:00  -m a -R y -l h="e227|e228|e229|e230|e231|e232|e233|e234|e235|e236|e237|e238|e240|e241|e242|e243|e244|e245"


pa_concurrent_jobs = 120
cns_concurrent_jobs = 110
ovlp_concurrent_jobs = 120


# from synth0 example:

pa_HPCdaligner_option =   -v -B4 -t50 -h1 -e.99 -w1 -l1 -s1000
ovlp_HPCdaligner_option = -v -B4 -t50 -h1 -e.99 -l1 -s1000

#pa_DBsplit_option =   -a -x5 -s.00065536
pa_DBsplit_option =   -a -x5 -s.065536
#pa_DBsplit_option =   -a -x5 -s1
ovlp_DBsplit_option = -a -x5 -s50

falcon_sense_option = --output_multi --min_idt 0.70 --min_cov 1 --max_n_read 20000 --n_core 0
#--min_cov_aln 1 --min_len_aln 40

overlap_filtering_setting = --max_diff 10000 --max_cov 100000 --min_cov 1 --min_len 1 --bestn 1000 --n_core 0
#dazcon = 1

dgordon562 avatar Oct 05 '16 21:10 dgordon562

Maybe the default changed.

job_type = sge

pb-cdunn avatar Oct 05 '16 22:10 pb-cdunn

Hurray! That did it! Thanks, Chris!

Well, half a hurray...fc_run.py crashed leaving the prepare_database job running.

Why did fc_run.py crash? This is the info it gives:

Exception: Caused by:
Traceback (most recent call last):
  File "/net/gs/vol1/home/dgordon/falcon/160915_unzip/FALCON-integrate/pypeFLOW/pypeflow/controller.py", line 523, in refreshTargets
    rtn = self._refreshTargets(task2thread, objs = objs, callback = callback, updateFreq = updateFreq, exitOnFailure = exitOnFailure)
  File "/net/gs/vol1/home/dgordon/falcon/160915_unzip/FALCON-integrate/pypeFLOW/pypeflow/controller.py", line 740, in _refreshTargets
    raise TaskFailureError("Counted %d failure(s) with 0 successes so far." %failedJobCount)
TaskFailureError: 'Counted 1 failure(s) with 0 successes so far.'

I'm not sure where to look for errors anymore. This is the end of all.log:

2016-10-05 22:37:39,173 - pwatcher.fs_based - DEBUG - query(which='list', jobids=<0>)
2016-10-05 22:37:39,179 - pypeflow.pwatcher_bridge - DEBUG - In alive(), updated result of query:{'jobids': {}}
2016-10-05 22:37:39,179 - pypeflow.controller - WARNING - #tasks=1, #alive=0
2016-10-05 22:37:39,181 - pwatcher.fs_based - DEBUG - query(which='list', jobids=<0>)
2016-10-05 22:37:39,186 - pypeflow.pwatcher_bridge - DEBUG - In alive(), updated result of query:{'jobids': {}}

By the way, pypeflow has always reported time about 7 hours in the future. Is pypeflow tied to GMT or something? (grin)

dgordon562 avatar Oct 05 '16 22:10 dgordon562

By the way, pypeflow has always reported time about 7 hours in the future. Is pypeflow tied to GMT or something? (grin)

Yes, that's a requirement for PB.

Finding what went wrong can be a bit tricky. In the main log, you should see an ERROR somewhere. That will tell use the URL for the failed task. From that, you can usually guess the run-dir. And in the run-dir, look for pwatcher.dir/stderr. Suddenly, the problem will be obvious.

pb-cdunn avatar Oct 05 '16 22:10 pb-cdunn

back working on debugging this...

you say "in the main log, you should see an ERROR somewhere." Which is the "main log"? all.log? there is no ERROR in that file. pypeflow.log? It says:

2016-10-05 21:30:21,424 - pypeflow.controller - ERROR - Any exception caught in RefreshTargets() indicates an unrecoverable error. Shutting down...
Traceback (most recent call last):
  File "/net/gs/vol1/home/dgordon/falcon/160201/FALCON-integrate/fc_env/lib/python2.7/site-packages/pypeflow-0.1.1-py2.7.egg/pypeflow/controller.py", line 522, in refreshTargets
    rtn = self._refreshTargets(task2thread, objs = objs, callback = callback, updateFreq = updateFreq, exitOnFailure = exitOnFailure)
  File "/net/gs/vol1/home/dgordon/falcon/160201/FALCON-integrate/fc_env/lib/python2.7/site-packages/pypeflow-0.1.1-py2.7.egg/pypeflow/controller.py", line 684, in _refreshTargets
    time.sleep(sleep_time)
KeyboardInterrupt
2016-10-05 21:30:21,438 - pypeflow.controller - WARNING - #tasks=1, #alive=1
2016-10-05 21:30:23,438 - pypeflow.controller - WARNING - Now, #tasks=1, #alive=1
2016-10-05 21:30:23,838 - pypeflow.task - DEBUG - task:///net/gs/vol1/home/dgordon/falcon/160201/FALCON-integrate/fc_env/lib/python2.7/site-packages/falcon_kit-0.4.0-py2.7-linux-x86_64.egg/falcon_kit/mains/run.py/task_build_rdb fails to generate all outputs
2016-10-05 21:30:23,853 - pypeflow.controller - WARNING - Now, #tasks=1, #alive=

dgordon562 avatar Nov 08 '16 18:11 dgordon562

Well, it's a DEBUG in this case, but it says we are missing outputs from task_build_rdb. You should see something in stderr for task_build_rdb.

pb-cdunn avatar Nov 08 '16 18:11 pb-cdunn