RepeatMasker icon indicating copy to clipboard operation
RepeatMasker copied to clipboard

ProcessRepeats generates empty/blank "ID" values, which causes other errors. e.g. RM2Bed.py "invalid literal"

Open wharvey31 opened this issue 3 years ago • 4 comments

RM2Bed.py (v4.1.2) breaks on invalid literal from Mmul10.fa (https://hgdownload.soe.ucsc.edu/goldenPath/rheMac10/bigZips/rheMac10.fa.gz) with the "human" library. I have attached the .out file from a successful RM run on chr2 of the Mmul10 genome, which has produced this issue.

The easiest way to reproduce this is with

RM2Bed.py 12-of-23.fa.out.gz

Python produces the following error.

ValueError: invalid literal for int() with base 10: '*'

Repeatmasker version 4.12 was installed with bioconda. On the following operating system:

LSB Version: :core-4.1-amd64:core-4.1-noarch Distributor ID: CentOS Description: CentOS Linux release 7.9.2009 (Core) Release: 7.9.2009 Codename: Core

12-of-23.fa.out.gz

wharvey31 avatar Sep 27 '21 18:09 wharvey31

Thanks for reporting this problem. Something strange has happened in this output file; some IDs are missing. The first missing ID is on line 3197 of the (uncompressed) file; there seems to be a particularly tricky SVA element there. For a "quick and dirty" fix, I suggest hand-editing that and the nearby lines to put an ID in - perhaps 2673 to match the other nearby SVA fragments.

The cause of this problem was in RepeatMasker, specifically the ProcessRepeats program. It should never produce lines with "missing" IDs such as this. Do you have a corresponding .cat file for this output? If you can send it to us, it should help us to troubleshoot the issue more thoroughly.

jebrosen avatar Oct 12 '21 22:10 jebrosen

Yep. You can find the cat file here: https://eichlerlab.gs.washington.edu/public/wharvey/RM_test/12-of-23.fa.cat.gz. Too large to attach here.

Thanks!

wharvey31 avatar Oct 12 '21 23:10 wharvey31

Thank you! I have successfully reproduced the problem with this input file, and I expect that it should help immensely in narrowing down the cause of the error.

jebrosen avatar Oct 13 '21 18:10 jebrosen

This problem has been identified and should be fixed in the upcoming 4.1.3 release.

rmhubley avatar Aug 24 '22 17:08 rmhubley