smudgeplot icon indicating copy to clipboard operation
smudgeplot copied to clipboard

Ploidy inference of high heterozygous genome

Open Axolotl233 opened this issue 1 year ago • 0 comments

Hello commuity,

I have troubles understanding my smudgeplot. I noticed that the master branch of smudgeplot updated in Sep 24 (Oriel version), and readme file declared that it will be updated in Oct 18. So I used sploidplot branch (which was updated in Oct 18). I have used follwing command to generate it.

FastK -v -t4 -k31 -M16 -T4 fastq_[12].fastq.gz -Ndata/FastK_Table
smudgeplot.py hetmers -L 12 -t 4 -o data/kmerpairs --verbose data/FastK_Table
smudgeplot.py all -o data/run data/kmerpairs_text.smu

and the result look like this: run_smudgeplot_log10.pdf run_centralities.pdf

My study species is a allotetraploid (2n=4x) and the potential progenitor is known, the genome size of this speciesi is about 720m based on flow cytomtry. Everything seems Ok of this figure and there have two peaks in AABB and AAAB. Howevery I noticed the the words "1n=30" in bottom of figure, which should indicate (if I did not make mistakes) the haplotype depth/coverge. In my understand it should be the quotient of total sequence data divided by genome size of 1n. In my case, we sequenced 48G paired illumina reads, so the value of 1n should be 48G/720m = 66. I aslo found turning point in run_centralities.pdf at edge of picture, so I increased parameter -cov_max to 100 and -ylim to 300 when I ran smudgeplot.py all command, the result look like this: run_centralities.pdf run_smudgeplot_log10.pdf

Strangely, the generated smudge plot is clearly wrong. Although the position previously marked with AB still exists in minor kmer coverage = 0.5 and total coverage of the kmer pair ~ 60, it missed mark, and the postion previously marked as AABB now marked as AB. I don't know what causes this. Does the haplotype depth here refer to the depth of a single subgenome? Then how should the run_centralities.pdf be interpreted? Furthermore, my genome is highly heterozygous, I don't know if the differences between the subgenomes (A1 vs B1, A2 vs B2) will be confused with the differences between the two haplotype genomes(n1 vs n2, n1 = A1B1, n2 = A2B2, A and B is two subgenome, Heterozygosity) when generated smudgeplot. linear_plot transformed_linear_plot

Axolotl233 avatar Oct 25 '24 14:10 Axolotl233