NextDenovo icon indicating copy to clipboard operation
NextDenovo copied to clipboard

Unusual N50 and genome size from triocanu hap reads

Open hrluo93 opened this issue 2 years ago • 4 comments

Hi,

We used ont ultra-long hap reads from triocanu to assemble the hap genome. we found an unusual N50 and genome size by using ver2.5.0. The genome size was less than our expected about 100Mb and the N50 was quite low only 1Mb. what caused this unusual result?

Best Wishes! Ran

triocanu: 5712315 reads 113196439599 bases written to haplotype file ./haplotype-Mat.fasta.gz. 5920759 reads 117888522482 bases written to haplotype file ./haplotype-Pat.fasta.gz. 80281 reads 163332564 bases written to haplotype file ./haplotype-unknown.fasta.gz. 722242 reads 416302535 bases filtered for being too short.

seq_stat [Read length stat] Types Count (#) Length (bp) N10 126241 73349 N20 310602 57003 N30 539192 47044 N40 812681 39707 N50 1134462 33909 N60 1511527 28777 N70 1967562 22918 N80 2571902 16448 N90 3483829 9907

Types Count (#) Bases (bp) Depth (X) Raw 5920759 117888522482 117.89 Filtered 0 0 0.00 Clean 5920759 117888522482 117.89

*Suggested seed_cutoff (genome size: 1000.00Mb, expected seed depth: 45, real seed depth: 45.00): 40906 bp

our set rerun: 3 task: all deltmp: 1 rewrite: 1 read_type: ont job_type: local input_type: raw genome_size: 1g seed_depth: 45.0 parallel_jobs: 5 pa_correction: 3 seed_cutfiles: 3 read_cutoff: 25k job_prefix: nextfe seed_cutoff: 40906 blocksize: 11214124835 ctg_cns_options: -p 15 nextgraph_options: -a 1 sort_options: -m 20g -t 15 -k 40 minimap2_options_map: -x map-ont minimap2_options_raw: -t 8 -x ava-ont correction_options: -p 15 -max_lq_length 10000 -min_len_seed 20453 minimap2_options_cns: -t 8 -x ava-ont -k 17 -w 17 --minlen 2000 --maxhan1 5000

[Read length stat] Types Count (#) Length (bp) N10 75719 82925 N20 182706 66596 N30 310850 56989 N40 458514 49976 N50 625760 44358 N60 813392 39692 N70 1022447 35703 N80 1254589 32159 N90 1512998 28759

Types Count (#) Bases (bp) Depth (X) Raw 5920759 117888522482 117.89 Filtered 4115291 39249147976 39.25 Clean 1805468 78639374506 78.64

Result Type Length (bp) Count (#) N10 4637305 14 N20 3002831 37 N30 2161021 72 N40 1744945 116 N50 1288785 173 N60 1019279 248 N70 793166 343 N80 580083 471 N90 384460 650

Min. 40232 - Max. 12343781 - Ave. 856464 - Total 859033589 1003

hrluo93 avatar Apr 15 '22 03:04 hrluo93

How about the result using all data?

moold avatar Apr 15 '22 06:04 moold

Thank you very much for your reply! I am trying using all raw ont reads to assemble non-hap to verify if some reads missing because of triocanu. And planning uses all hap data to do hap asm to verify if some needed reads contain in short reads. If using all data to do hap asm, according to my log, what seed_depth, seedcut and readcut you suggest use? 50X, auto, 5K?

Best Wishes! Ran

hrluo93 avatar Apr 15 '22 07:04 hrluo93

Just try to use the default value to see how about the result, first.

moold avatar Apr 15 '22 08:04 moold

Just try to use the default value to see how about the result, first.

Thanks! Dr.Hu, I am trying readcut 1K first!

hrluo93 avatar Apr 15 '22 08:04 hrluo93