smudgeplot icon indicating copy to clipboard operation
smudgeplot copied to clipboard

Octoploid interpretation - more coverage needed?

Open SamCT opened this issue 8 months ago • 1 comments

Hello,

Thank you for the tool. Your help was much appreciated for a previous hexaploid we had -- which confirmed the need for more sequencing data. Now we have an expected Octoploid, which I didn't realize. It was sequenced with the expectation being a diploid. The genomescope plot makes it clear it is not a diploid. Now I am trying to make sense of a smudgeplot result.

FastK -v -t4 -T12 -M64 -k31 hifi_reads.fastq.gz -NTable_FASTK/tableRedo -PTemp2

smudgeplot.py hetmers -L 22 -t 24 -o kmerPairs_2 --verbose Table_FASTK/tableRedo -tmp Temp2

 smudgeplot.py all -o Smudgeplot_PB_Check kmerPairs_2_text.smu

and it looked like this:

Image

PB_Check_smudgeplot1.pdf

Comparing this plot, to the octoploid strawberry from your example shows that we are less than half that of the total strawberry octoploid kmers.

I believe we need more coverage, but am wondering if you have any thoughts?

Thanks, Sam

SamCT avatar Apr 18 '25 19:04 SamCT

Hi @SamCT,

I think you might be fine! And how sure are you about the octoploidy? k-mer wise it looks like a solid degenerated tetraploid, possibly allotetraploid. I know the strongest smudge is AB, but look how much AAAB and AABB k-mers are there, that's why I rather look at the log version of the plot, makes the patterns easier to see. Second question/wonder I have is what is the 1n coverage. If it's 40x than you are def fine, but of course if it's ~20x (and you just missed a small 1n peak) then it might be worth reconsidering. Look at the very bottom of the log smudgeplot, is there a small smudge forming underneath? My guess is not though.

I would rerun the GenomeScope model with -l 40 -p 4 parameters. That would be my best guess for what you are actually going on. If it's an oktoploid, then the 1n coverage is definitely half of what this model will assume, but given you divide coverage by two and multiply ploidy by 2 it cancels out so the monoploid genome size estimate will be for sure accurate. You have genome size expecation, decent dataset, I would assemble it and see how it goes. It should be ok, because even if it is an oktoploid, it will be very very homozygous, so that should give you any troubles.

My point is, regardless which one it is, effectively you can work with it as with a tetraploid and if it happens to be oktoploid, you can figure it out (once you get a decent monoploid assembly, you can map back reads and see if there is a reasonable number of alleles supported by ~20x and if so, it might be a selfing oktoploid. Thinking about the biology of the species will also help - is this a selfing plant?)

Hope this helps.

K

KamilSJaron avatar Apr 24 '25 07:04 KamilSJaron