FALCON
FALCON copied to clipboard
too long time running run_filter_stage2
Hello, I am working on a plant genome pacbio assembly and I got 52 X corrected reads. When feeding these preads to FALCON assembly, it took me more than two days running run_filter_stage2 and has not finished right now. I checked the las.fofn file, which contains 323036 lines. I assume that the long running time is caused by so many las files? Is that normal? Any suggestions? Thanks a lot!
###My configure file looks like:
[General]
input_fofn = preads.fofn
input_type = preads
length_cutoff = 10000
length_cutoff_pr = 9000
sge_option_da = -pe orte 8 -q all.q
sge_option_la = -pe orte 8 -q all.q
sge_option_pda = -pe orte 8 -q all.q
sge_option_pla = -pe orte 8 -q all.q
sge_option_fc = -pe orte 8 -q all.q
sge_option_cns = -pe orte 8 -q all.q
pa_concurrent_jobs = 60
cns_concurrent_jobs = 60
ovlp_concurrent_jobs = 60
pa_HPCdaligner_option = -v -dal4 -t16 -e.70 -l1000 -s1000
ovlp_HPCdaligner_option = -l4800 -k18 -h480 -w8 -H15000 -M32
pa_DBsplit_option = -x200 -s50
ovlp_DBsplit_option = -x200 -s50
falcon_sense_option = --output_multi --min_idt 0.70 --min_cov 3 --max_n_read 200 --n_core 6
overlap_filtering_setting = --max_diff 100 --max_cov 80 --min_cov 2 --bestn 10 --n_core 24
yes. you need -dal
option on the ovlp_HPCdaligner_option
parameters. You have way to many smaller las
files for the filter to go through. The excessive shell processes probably is the culprit of the slowness. Try "-dal128" (in newer version "-B128") to reduce the final number of merged files in the final overlapping stage. I typically watch how many merge jobs will be there by examining the 1-preads_ovl/run_jobs.sh
Another note, if you have already get many many small las
files, you could manually merge them and ask fc_ovlp_filter.py
to take the merged las
files as input. However, you have to make sure you don't redundant entries in the merged files.
Thanks Jason. I have re-sumbited the job with -dal128. I will see the results. That would take too long.
2016-05-24 10:41 GMT+08:00 Jason Chin [email protected]:
Another note, if you have already get many many small las files, you could manually merge them and ask fc_ovlp_filter.py to take the merged las files as input. However, you have to make sure you don't redundant entries in the merged files.
— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/PacificBiosciences/FALCON/issues/372#issuecomment-221151886
Hi Jason,
I tried -B128
but still have the same problem.
I think it might be a bug after I updating the latest falcon release.
My previous run (successful case) in which I used falcon v0.4 generate a las.fofn file contain only preads.*.las
. The context of las.fofn is attached below:
/home/zhangxt/project/AP85/1-preads_ovl/las_files/preads.62.las
/home/zhangxt/project/AP85/1-preads_ovl/las_files/preads.73.las
/home/zhangxt/project/AP85/1-preads_ovl/las_files/preads.104.las
/home/zhangxt/project/AP85/1-preads_ovl/las_files/preads.63.las
/home/zhangxt/project/AP85/1-preads_ovl/las_files/preads.132.las
However, the failure one (latest falcon release) generated a las.fofn file which contains all las file, including L1.*.las
, L2.*.las
and preads.*.las
. Part of the file were attached below:
/home/zhangxt/project/LgSXasm/try_corOutCoverage80/falcon_t1/1-preads_ovl/m_00001/L1.1.114.las
/home/zhangxt/project/LgSXasm/try_corOutCoverage80/falcon_t1/1-preads_ovl/m_00001/L1.1.207.las
...
Is this a bug or anything I did wrong?
I can only use preads.*las
right now but I would like to know what cause this problem. I could avoid this in the future.
Thanks!
yes. it is a bug. I submitted a PR already. see https://github.com/PacificBiosciences/FALCON/pull/367
Could you tell us what commit you are using? git rev-parse HEAD
. Did you simply download the latest release. I am about to issue a new release with the fix.
The good news is that you will not need to re-run everything. After updating FALCON (the tip of master
is fine), simply:
rm -rf 2-*/
rm -rf 1-*/
And restart. Stage-0 should be fine.