FALCON icon indicating copy to clipboard operation
FALCON copied to clipboard

KeyError during falcon_kit.stats_preassembly job

Open yingzhang121 opened this issue 8 years ago • 4 comments

Hi, Developer,

I had a very weird error.

When I ran Falcon on a set of 8 SMRT cells, job done, no problem. However, when I ran Falcon on 25 SMRT cells (including the previous 8), the job got killed after daligner step.

So I looked into the stderr and stdout files, and found out the real error message is following:

… … … /panfs/roc/scratch/kianians/falcon_unzip_test/0-rawreads/preads/cns_00102/cns_00102.fasta /panfs/roc/scratch/kianians/falcon_unzip_test/0-rawreads/preads/cns_00103/cns_00103.fasta /panfs/roc/scratch/kianians/falcon_unzip_test/0-rawreads/preads/cns_00104/cns_00104.fasta /panfs/roc/scratch/kianians/falcon_unzip_test/0-rawreads/preads/cns_00105/cns_00105.fasta > [4077]$ DBdump -h /panfs/roc/scratch/kianians/falcon_unzip_test/0-rawreads/raw_reads.db > ERROR:falcon_kit.stats_preassembly:Using arbitrary truncation metric: -1.0 Traceback (most recent call last): File "/panfs/roc/itascasoft/pacificbiosciences-falcon/FALCON-integrate/0.7/FALCON-integrate/FALCON/falcon_kit/stats_preassembly.py", line 210, in calc_dict trunc = metric_truncation(i_raw_reads_db_fn, i_preads_fofn_fn) File "/panfs/roc/itascasoft/pacificbiosciences-falcon/FALCON-integrate/0.7/FALCON-integrate/FALCON/falcon_kit/stats_preassembly.py", line 133, in metric_truncation return functional.calc_metric_truncation(dbdump_output, length_pairs_output) File "/panfs/roc/itascasoft/pacificbiosciences-falcon/FALCON-integrate/0.7/FALCON-integrate/FALCON/falcon_kit/functional.py", line 307, in calc_metric_truncation avg = -average_difference(pread_lengths, orig_lengths) File "/panfs/roc/itascasoft/pacificbiosciences-falcon/FALCON-integrate/0.7/FALCON-integrate/FALCON/falcon_kit/functional.py", line 292, in average_difference vb = dictB[k] KeyError: 0 [4077]$ DBdump -h /panfs/roc/scratch/kianians/falcon_unzip_test/0-rawreads/raw_reads.db > INFO:falcon_kit.stats_preassembly:stats for raw reads: FastaStats(nreads=3186311, total=20917694606, n50=8450, p95=13368) INFO:falcon_kit.stats_preassembly:stats for seed reads: FastaStats(nreads=823088, total=9600369423, n50=11212, p95=17825) ... ...

Then I went back to check the job log from my cluster. It turned out the preassembly report job just ran for 1 min 51 seconds b.

PBS Job Id: 2678307.mesabim3.msi.umn.edu Job Name: Jb73e8dacb75e47 Exec host: cn0659/6 Aborted by PBS Server Job exceeded its walltime limit. Job was aborted See Administrator for help Exit_status=-11 resources_used.cput=00:00:27 resources_used.energy_used=0 resources_used.mem=80252kb resources_used.vmem=955868kb resources_used.walltime=00:01:51

I and my colleagues guesses there might be a "clock" set in the Falcon script that only requests 1 minutes for the report job. However, the "preassembly report" job needed more time on more SMRT cells. But we could be wrong.

Could you look into this issue?

Best, Ying

yingzhang121 avatar Nov 08 '16 22:11 yingzhang121

I and my colleagues guesses there might be a "clock" set in the Falcon script that only requests 1 minutes for the report job.

Not specific to that step, but a recent change makes all steps run in the job-queue. No work is done in the main process anymore. And sge_option is the default for all steps lacking a specific settings, like sge_option_da. So maybe you need to set sge_option.

Check your logs. You should be able to find the qsub command for the report, and you should be able to submit it yourself and see whether you can repeat this problem.

pb-cdunn avatar Nov 10 '16 00:11 pb-cdunn

This makes sense, because the major difference between SGE and Torque is the time thing. I'll set the SGE option in the cfg file.

yingzhang121 avatar Nov 10 '16 01:11 yingzhang121

OK, after I added the sge_option to the fc_run.cfg file, the entire process got stuck at the "preassembly report" step. I had to manually kill the falcon process, and push the perassembly report before I could continue to the next step.

So apparently, the issue seems more complicated than the sge option.

yingzhang121 avatar Nov 14 '16 16:11 yingzhang121

Is this on the latest FALCON-integrate?

It should be possible to find error-output to explain the problem you have in the report. Maybe in pwatcher.dir/stderr in the report-run-dir? It's difficult to explain from here.

The KeyError in the report happens when there are no preads. I have no idea what could cause it to hang. Torque job-submission problems?

pb-cdunn avatar Nov 15 '16 17:11 pb-cdunn