minia icon indicating copy to clipboard operation
minia copied to clipboard

Is minia suitable for high heterozygous rate plant genome?

Open xiekunwhy opened this issue 5 years ago • 5 comments

Hi,

I am looking for some assemblers to assemble a high heterozygous rate plant genome(diploid, het rate > 2%, haplotype genome size ~3.6G). And I want to know how to use minia to assemble such a genome.

Best wishes, Kun

xiekunwhy avatar Apr 06 '20 03:04 xiekunwhy

Hi, Please try this: https://github.com/GATB/gatb-minia-pipeline with default parameters. And depending on whether you have mate-pairs or not, if you're encountering difficulties installing BESST, you might even skip that step altogether and use the --no-scaffolding flag. Another option, if Minia fails, is to try the Megahit assembler. If you'd like to fine-tune heterozygosity assembly, let me know, minia parameters can be tweaked to make shorter contigs and keep small variations or the opposite. Best, Rayan

rchikhi avatar Apr 07 '20 16:04 rchikhi

a reference: https://link.springer.com/article/10.1186/s13059-019-1899-5

rchikhi avatar Apr 07 '20 16:04 rchikhi

@xiekunwhy

Like @rchikhi said, the easiest way is to assemble the genome would to develop the contigs independently and scaffold using the Pairing and the Mate information.

What type of datasets would you be having is the question. Plant genomes can be very repetitive and heterozygous. You might want to remove the haplotypic duplications using purge_dups or something like that to remove these contigs and then scaffold them.

harish0201 avatar Sep 01 '20 16:09 harish0201

Hi @harish0201 ,

I have tried to use minia to assemble this genome, but I got a very very very very fragmented results, the genome size generated from minia is about 3 times larger than expected (>11G), and contig N50 is only 300bp, contig number is about twenty millions, and I don't think this contig results can be used for downstream analysis.

Minia is always with poor perfermance for high heterozygous rate genome according to my colleagues and friends who have used it in their works. I hope the authors can resolve this problem some day.

I turned to use soapdenovo2 + dbg2olc and masurca, and got resonnable results.

The data I used for contig assembling is ~100X PE150 illumina data, the insert size is about 400bp. I also have 4 MP libraries and about 30X ont long reads data for downstream analysis.

Best wishes, Kun

xiekunwhy avatar Sep 02 '20 02:09 xiekunwhy

Hi, did you try regular single-k Minia or the multi-k minia pipeline? Indeed, single-k Minia will give you fragmented assemblies, moreso with heterozygous genomes. In general, nowadays, I'd recommend using long reads and in particular PacBio HiFi for heterozygous genomes, if possible.

rchikhi avatar Sep 03 '20 08:09 rchikhi