minia
minia copied to clipboard
Is minia suitable for high heterozygous rate plant genome?
Hi,
I am looking for some assemblers to assemble a high heterozygous rate plant genome(diploid, het rate > 2%, haplotype genome size ~3.6G). And I want to know how to use minia to assemble such a genome.
Best wishes, Kun
Hi,
Please try this: https://github.com/GATB/gatb-minia-pipeline
with default parameters.
And depending on whether you have mate-pairs or not, if you're encountering difficulties installing BESST, you might even skip that step altogether and use the --no-scaffolding flag.
Another option, if Minia fails, is to try the Megahit assembler.
If you'd like to fine-tune heterozygosity assembly, let me know, minia parameters can be tweaked to make shorter contigs and keep small variations or the opposite.
Best,
Rayan
a reference: https://link.springer.com/article/10.1186/s13059-019-1899-5
@xiekunwhy
Like @rchikhi said, the easiest way is to assemble the genome would to develop the contigs independently and scaffold using the Pairing and the Mate information.
What type of datasets would you be having is the question. Plant genomes can be very repetitive and heterozygous. You might want to remove the haplotypic duplications using purge_dups or something like that to remove these contigs and then scaffold them.
Hi @harish0201 ,
I have tried to use minia to assemble this genome, but I got a very very very very fragmented results, the genome size generated from minia is about 3 times larger than expected (>11G), and contig N50 is only 300bp, contig number is about twenty millions, and I don't think this contig results can be used for downstream analysis.
Minia is always with poor perfermance for high heterozygous rate genome according to my colleagues and friends who have used it in their works. I hope the authors can resolve this problem some day.
I turned to use soapdenovo2 + dbg2olc and masurca, and got resonnable results.
The data I used for contig assembling is ~100X PE150 illumina data, the insert size is about 400bp. I also have 4 MP libraries and about 30X ont long reads data for downstream analysis.
Best wishes, Kun
Hi, did you try regular single-k Minia or the multi-k minia pipeline? Indeed, single-k Minia will give you fragmented assemblies, moreso with heterozygous genomes. In general, nowadays, I'd recommend using long reads and in particular PacBio HiFi for heterozygous genomes, if possible.