RM produces an alignment with negative query coordinates
Describe the issue
RM produced an alignment with a negative query coordinate (see below). This is not detected and causes rmToTrackHub to crash (#375)
Reproduction steps
Run by Hiram's UCSC pipeline, which splits the genome, with these parameters:
/hive/data/outside/RepeatMasker/RepeatMasker-4.2.1/RepeatMasker -uncurated -engine rmblast -pa 1 -align -species 'human'
Run information is here: https://hgwdev.gi.ucsc.edu/~markd/browser/bugs/2025-12-repeatmaster-H9/versionInfo.txt
The input genome, which is a haplotype of a T2t assembly of the H9 human ESC line, is here: https://hgwdev.gi.ucsc.edu/~markd/browser/bugs/2025-12-repeatmaster-H9/t2t_h9_v01_hap1.fa.gz
Log output
Full output not saved.
Environment (please include as much of the following information as you can find out):
RepeatMasker version 4.2.1
Search Engine: NCBI/RMBLAST [ 2.14.1+ ]
Installed from a tar file download.
Linux hgwdev 5.14.0-503.19.1.el9_5.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Dec 19 12:55:03 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux Rocky Linux release 9.5 (Blue Onyx)
full DFAM
Using Master RepeatMasker Database: /hive/data/outside/RepeatMasker/RepeatMasker-4.2.1/Libraries/famdb
Title : Dfam
Version : 3.9
Date : 2025-03-10
Families : 4,122,019
Additional context
Another user reports that they get the same error in v4.2.1, however v4.1.2-p1, v4.1.5, and v4.1.7-p1 all work.
Out and align files are here: https://hgwdev.gi.ucsc.edu/~markd/browser/bugs/2025-12-repeatmaster-H9/t2t_h9_v01_hap1.sorted.fa.out.gz https://hgwdev.gi.ucsc.edu/~markd/browser/bugs/2025-12-repeatmaster-H9/t2t_h9_v01_hap1.fa.align.gz
Example invalid alignment:
Matrix = 18p47g.matrix
Kimura (with divCpGMod) = 4.11
CpG sites = 22, Kimura (unadjusted) = 9.40
Transitions / transversions = 7.33 (22/3)
Gap_init rate = 0.00 (0 / 289), avg. gap size = 0.0 (0 / 0)
3760 9.29 3.91 10.83 chr10_hap1 12781100 12781294 (121604145) C MER4A1_v#LTR/ERV1 (413) 187 -1 c_b6s401i1 2515907
chr10_hap1 12781100 AGAGTTTGAGTTTGTCTGTCTTTTGCCCACAAGGAATTTCCTTACTGGCG 12781149
---------- i i i iiv i
C MER4A1_v#LTR/ 187 ----------TCTGTCTGTCCTTTGTCCACAAGGAATTTCCTTGTGGGCA 148
chr10_hap1 12781150 AATCATGAGGGAGGAATGTGGCTTTTTTATCTTTGTAGCTATGTTATTTA 12781199
ii v i v
C MER4A1_v#LTR/ 147 AATTGTGAGGGAGGTATGTAGCTTTTTTATCTTTGTAGCTATCTTATTTA 98
chr10_hap1 12781200 GGAATAAAATGGGAGGCAGGATTG-----CGTAGTTCCCAGCTTGACTTT 12781244
i v ----- i
C MER4A1_v#LTR/ 97 GGAATAGAATGGGAGGCAGGTTTGCCCGACGCAGTTCCCAGCTTGACTTT 48
chr10_hap1 12781245 TCCCTCCGGCTTAGTGATTTTGGGGTCCTGAGATTTATTTTCCTTTCACA 12781294
C MER4A1_v#LTR/ 47 TCCCTCCGGCTTAGTGATTTTGGGGTCCTGAGATTTATTTTCCTTTCACA -2