RepeatMasker icon indicating copy to clipboard operation
RepeatMasker copied to clipboard

How to correctly interpret the output of calcDivergenceFromAlign.pl?

Open ala98412 opened this issue 1 year ago • 1 comments

Hi,

I am using calcDivergenceFromAlign.pl and createRepeatLandscape.pl to plot the repeat landscape, but I've encountered some difficulties. 未命名

This is the repeat landscape plotted by the script createRepeatLandscape.pl. The repeat landscape shows that LINE/L2 with a k-distance of 0 has a proportion of 0.463% (basepair 3,023,067) in the genome (genome size 652,930,317).

However, according to the divsum file (output of calcDivergenceFromAlign.pl, sorted in ascending order.):

Class	Repeat	absLen	wellCharLen	Kimura%
LINE/L2	Zebrafish_L2-54_DR	292	148	-50.85
LINE/L2	DR0171017	180	97	-26.53
LINE/L2	rnd-1_family-43	843825	842394	1.88
LINE/L2	DR0172227	6	6	1.95
LINE/L2	rnd-1_family-45	3084378	3077852	2.12
LINE/L2	rnd-1_family-23	3396	3353	2.32
LINE/L2	rnd-1_family-41	397235	393867	2.58
LINE/L2	rnd-1_family-69	1123890	1121156	2.6
LINE/L2	rnd-1_family-20	503263	499225	2.68
LINE/L2	rnd-1_family-18	1105936	1101045	3.13

First, there are no elements with a k-distance of 0.

Second, if kimura% < 0 represents the category of k-distance 0, they cannot occupy 0.463% of the genome, because both Zebrafish_L2-54_DR and DR0171017 only have a small proportion according to the .out file of RepeatMasker:

Repeat	Kimura%	count	basepair
Zebrafish_L2-54_DR	-50.85	6	806
DR0171017	-26.53	12	586

I think I might be misunderstanding these outputs. How should I interpret them correctly?

Best, Jui-Hung

ala98412 avatar Jul 10 '24 00:07 ala98412

Something appears to be wrong with your input to calcDivergenceFromAlign.pl. Kimura divergences should not be negative. Do you mind sharing your alignment data so that I can try to reproduce this?

rmhubley avatar Aug 08 '24 19:08 rmhubley

Closing this for now. Please let me know if you continue to have problems.

rmhubley avatar Sep 11 '24 19:09 rmhubley