EarlGrey icon indicating copy to clipboard operation
EarlGrey copied to clipboard

Filtering overlapping repeats for chimers

Open pellescholten opened this issue 10 months ago • 0 comments

Hi!

I was looking for a way to filter out overlapping sequences of my RepeatCraft output and tried your filteringOverlappingRepeats.R script.

However, it seems to have an issue with chimeric or nested repeats. In these cases the overlap is either not resolved or the nested repeat gains a Start of the sequence that is after its End of the sequence.

For example: a LTR nested in a TIR is in the rmerge file

contig_1000	RepeatMasker	CLASSII/TIR	9374	9777	12.2	+	.	Tstart=48;Tend=405;ID=EDTA_TE_00001334_inc;shortTE=T
contig_1000	RepeatMasker	CLASSI/LTR	9514	9612	25.2	+	.	Tstart=5136;Tend=5358;ID=RM2_rnd-5_family-4_unconfirmed;shortTE=T
contig_1000	RepeatMasker	CLASSII/TIR	9444	9645	12.2	+	NA	Tstart=48;Tend=405;ID=EDTA_TE_00001334_inc;shortTE=T
contig_1000	RepeatMasker	CLASSI/LTR	9646	9612	25.2	+	NA	Tstart=5136;Tend=5358;ID=RM2_rnd-5_family-4_unconfirmed;shortTE=T

I am not sure what is the easiest way to solve this in the current code as you would need to update the two repeats at the same time...

Cheers

pellescholten avatar Apr 12 '24 15:04 pellescholten