RepeatMasker icon indicating copy to clipboard operation
RepeatMasker copied to clipboard

RM produces an alignment with negative query coordinates

Open diekhans opened this issue 1 week ago • 1 comments

Describe the issue

RM produced an alignment with a negative query coordinate (see below). This is not detected and causes rmToTrackHub to crash (#375)

Reproduction steps

Run by Hiram's UCSC pipeline, which splits the genome, with these parameters:

/hive/data/outside/RepeatMasker/RepeatMasker-4.2.1/RepeatMasker -uncurated -engine rmblast -pa 1 -align -species 'human'

Run information is here: https://hgwdev.gi.ucsc.edu/~markd/browser/bugs/2025-12-repeatmaster-H9/versionInfo.txt

The input genome, which is a haplotype of a T2t assembly of the H9 human ESC line, is here: https://hgwdev.gi.ucsc.edu/~markd/browser/bugs/2025-12-repeatmaster-H9/t2t_h9_v01_hap1.fa.gz

Log output

Full output not saved.

Environment (please include as much of the following information as you can find out):

RepeatMasker version 4.2.1
Search Engine: NCBI/RMBLAST [ 2.14.1+ ]

Installed from a tar file download.

Linux hgwdev 5.14.0-503.19.1.el9_5.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Dec 19 12:55:03 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux Rocky Linux release 9.5 (Blue Onyx)

full DFAM

Using Master RepeatMasker Database: /hive/data/outside/RepeatMasker/RepeatMasker-4.2.1/Libraries/famdb
  Title    : Dfam
  Version  : 3.9
  Date     : 2025-03-10
  Families : 4,122,019

Additional context

Another user reports that they get the same error in v4.2.1, however v4.1.2-p1, v4.1.5, and v4.1.7-p1 all work.

Out and align files are here: https://hgwdev.gi.ucsc.edu/~markd/browser/bugs/2025-12-repeatmaster-H9/t2t_h9_v01_hap1.sorted.fa.out.gz https://hgwdev.gi.ucsc.edu/~markd/browser/bugs/2025-12-repeatmaster-H9/t2t_h9_v01_hap1.fa.align.gz

Example invalid alignment:

Matrix = 18p47g.matrix
Kimura (with divCpGMod) = 4.11
CpG sites = 22, Kimura (unadjusted) = 9.40
Transitions / transversions = 7.33 (22/3)
Gap_init rate = 0.00 (0 / 289), avg. gap size = 0.0 (0 / 0)


3760 9.29 3.91 10.83 chr10_hap1 12781100 12781294 (121604145) C MER4A1_v#LTR/ERV1 (413) 187 -1 c_b6s401i1 2515907

  chr10_hap1      12781100 AGAGTTTGAGTTTGTCTGTCTTTTGCCCACAAGGAATTTCCTTACTGGCG 12781149
                           ---------- i        i    i                 iiv   i
C MER4A1_v#LTR/        187 ----------TCTGTCTGTCCTTTGTCCACAAGGAATTTCCTTGTGGGCA 148

  chr10_hap1      12781150 AATCATGAGGGAGGAATGTGGCTTTTTTATCTTTGTAGCTATGTTATTTA 12781199
                              ii         v    i                      v       
C MER4A1_v#LTR/        147 AATTGTGAGGGAGGTATGTAGCTTTTTTATCTTTGTAGCTATCTTATTTA 98

  chr10_hap1      12781200 GGAATAAAATGGGAGGCAGGATTG-----CGTAGTTCCCAGCTTGACTTT 12781244
                                 i             v   -----  i                  
C MER4A1_v#LTR/         97 GGAATAGAATGGGAGGCAGGTTTGCCCGACGCAGTTCCCAGCTTGACTTT 48

  chr10_hap1      12781245 TCCCTCCGGCTTAGTGATTTTGGGGTCCTGAGATTTATTTTCCTTTCACA 12781294

C MER4A1_v#LTR/         47 TCCCTCCGGCTTAGTGATTTTGGGGTCCTGAGATTTATTTTCCTTTCACA -2

diekhans avatar Dec 22 '25 16:12 diekhans