LJA
LJA copied to clipboard
Purge dups ?
Hi @AntonBankevich ?
Is there need to purge haplotype duplication of the output assebmly ? As far as I know, Canu and HiFiasm needs another run of purge_dups.
Thanks in advance!
Sincerely Johnsonz
Hi! Thank you for your interest in LJA. Current version of LJA treats diploid genomes as two separate genomes that just happened to be similar. Thus no collapsing is performed for similar sequences resulting in shorter contigs and duplicated sequences. Producing completely purged of duplications (consensus) assembly as well as phased assemblies with much longer blocks (using combinations with other technologies) is what we are currently working on. We plan to present it in the next big release paper.
Is that mean LJA producing two haploid genomes with similar size ?
The extent of fragment in my assmbly is very high (contigs=8067), and I found the size is very huge ~ 1.7G (expected size ~ 1.1G).
How could I tune the parameters such as k
and K
?
Hi! Sorry for taking long to reply. LJA tries to produce two haploid genomes but it is often not possible because the read length is less than the length of conservative regions with no divergence between paired chromosomes. In current version we intentionally do not perform any duplication purging to retain as much information as possible. So for example if you use hifiasm than our output corresponds to their contigs in r_utg.gfa file . That is why you have many contigs in the output and their total length is high (it should be closer to double length of the genome). These contigs may be shorter but they are more "honest". We are working on producing all kinds of contigs including consensus like hifiasm does but this is still in progress.
Thanks a lot. I try to assemble sex chromosome in chicken. It is single haplotype. But it turns out:
WARNING: no reads passed the length filter 2500
There is my reads statistics:
(asm_practise) $ seqkit stats hifi_silkie_wmap_zw_unmap_non-supp-secd.fa
file format type num_seqs sum_len min_len avg_len max_len
hifi_silkie_wmap_zw_unmap_non-supp-secd.fa FASTA Unlimit 178,129 2,072,830,671 0 11,636.7 27,733
Found the problem: If there is a emplty reads in fasta file LJA will show the warning.
I see. Personally I think that blank lines and empty sequences should not be allowed in fasta files but I did not find any indication of that in the fasta file specification. So in the next release blank lines and empty sequences will be allowed in input files. Currently you can use the LJA version from branch "bug_fix" to access this feature.
The extent of fragment in my assmbly is very high (contigs=8067), and I found the size is very huge ~ 1.7G (expected size ~ 1.1G). How could I tune the parameters such as
k
andK
?
Hi @AntonBankevich, I have the same question here: How to set appropriate 'k' and 'K'? Also, how to phase from the resulted assembly? purge haplotigs for partially phasing? or use HiC reads for haplotype-resolved assemblies?
Thanks! Chen