Peregrine
Peregrine copied to clipboard
Question on SHIMMER and MC
Hi,
I am exploring using Peregrine with some Illumina corrected single molecule reads (>99% ID to Illumina reference). Sequenced to ~ 250x. I was wondering if and what the correlation between shimmer-r and mc was? Explicitly, does the the SHIMMER count increase as the reduction factor is increased? Or am I misinterpreting the documentation?
I am trying to assemble a heterozygous (~1%), highly repetitive (~70%), diploid genome and am obtaining an over-inflated (3 to 4 x size) highly fragmented output. At the moment I would be happy to obtain a consensus assembly. Any advice on parameters to tweak would be appreciated. Would increasing the reduction factor help remove redundancy?
Thanks Kyle
"mc" stands for "mmer count". The higher the count, the higher the likelihood the k-mer is from a repeat. The shimmer-r controls the reduce level. The smaller shimmer-r
given more dense SHIMMER for index (-> lager index file, more sensitive for overlapping.)
For "unique" part of the genome, the mc
should be more or less independent of shimmer-r
. However, increasing SHIMMER density would increase mc
. This is my current guess.