How to correctly interpret the output of calcDivergenceFromAlign.pl?
Hi,
I am using calcDivergenceFromAlign.pl and createRepeatLandscape.pl to plot the repeat landscape, but I've encountered some difficulties.
This is the repeat landscape plotted by the script createRepeatLandscape.pl. The repeat landscape shows that LINE/L2 with a k-distance of 0 has a proportion of 0.463% (basepair 3,023,067) in the genome (genome size 652,930,317).
However, according to the divsum file (output of calcDivergenceFromAlign.pl, sorted in ascending order.):
Class Repeat absLen wellCharLen Kimura%
LINE/L2 Zebrafish_L2-54_DR 292 148 -50.85
LINE/L2 DR0171017 180 97 -26.53
LINE/L2 rnd-1_family-43 843825 842394 1.88
LINE/L2 DR0172227 6 6 1.95
LINE/L2 rnd-1_family-45 3084378 3077852 2.12
LINE/L2 rnd-1_family-23 3396 3353 2.32
LINE/L2 rnd-1_family-41 397235 393867 2.58
LINE/L2 rnd-1_family-69 1123890 1121156 2.6
LINE/L2 rnd-1_family-20 503263 499225 2.68
LINE/L2 rnd-1_family-18 1105936 1101045 3.13
First, there are no elements with a k-distance of 0.
Second, if kimura% < 0 represents the category of k-distance 0, they cannot occupy 0.463% of the genome, because both Zebrafish_L2-54_DR and DR0171017 only have a small proportion according to the .out file of RepeatMasker:
Repeat Kimura% count basepair
Zebrafish_L2-54_DR -50.85 6 806
DR0171017 -26.53 12 586
I think I might be misunderstanding these outputs. How should I interpret them correctly?
Best, Jui-Hung
Something appears to be wrong with your input to calcDivergenceFromAlign.pl. Kimura divergences should not be negative. Do you mind sharing your alignment data so that I can try to reproduce this?
Closing this for now. Please let me know if you continue to have problems.