RepeatMasker
RepeatMasker copied to clipboard
Repeat Masker Output Files
Hello, I'm trying to generate gene annotation file using RepeatMasker. Specifically, I need transposable elements in lncRNA sequence. Currently, I'm using Dfam library and RMBlast search engine.
I put lncRNA fasta file to RepeatMasker:
GenomeFasta="path/to/input/fasta/file"
RepeatMasker -species human -nolow -gff -u ${GenomeFasta}
And I got output files: fa.cat, fa.masked, fa.ori.out, fa.out, fa.out.gff, fa.tbl
I have two questions:
- how can I view or open up these files? I opened gff file using R studio but I think it's a bit different from usual gff file?
- I need gene annotation file like gff or gtf format. How can I convert RepeatMasker output to gene annotation file? I'm trying to use bedtools, but not sure how can I feed these output files into bedtools.
Any feedbacks would be appreciated. Thank you for the help in advance!
how can I view or open up these files? I opened gff file using R studio but I think it's a bit different from usual gff file?
Each of the output files are plain text, and should be able to be opened in most text editors. Can you explain more specifically how you opened the file in R studio (e.g. which menu options or R code), and why you think it is different from a usual gff file?
I need gene annotation file like gff or gtf format. How can I convert RepeatMasker output to gene annotation file? I'm trying to use bedtools, but not sure how can I feed these output files into bedtools.
fa.out.gff
is already in GFF(2) format, but we do also provide a script util/rmOutToGFF3.pl
which can be used to convert RepeatMasker .out
files to GFF(3) instead.
Thank you for the feedback. I just figured it out, I'm using biomartr, and read_rm function.
Still, I need gene coordinates, like in gtf format, which in chromosome/start/end format. In this sense, I was trying to use bedtools to get gene coordinates in bed format. But I'm a bit confused, like .fa.masked is not fasta file format, so how can I convert it? Does rmOutToGFF3.pl can generate gene annotation file with chr/start/end?
Thank you
Or can I simply change the extention from fa.masked to .fa and feed in to bedtools?
Still, I need gene coordinates, like in gtf format, which in chromosome/start/end format.
Yes, that and other information is included in the .gff
file.
Does rmOutToGFF3.pl can generate gene annotation file with chr/start/end?
rmOutToGFF3.pl
converts RM output to GFF3, which also contains that information.
In this sense, I was trying to use bedtools to get gene coordinates in bed format. But I'm a bit confused, like .fa.masked is not fasta file format, so how can I convert it? Or can I simply change the extention from fa.masked to .fa and feed in to bedtools?
The fa.masked
file is already a FASTA file - you should not even need to change the file extension in order to use it with bedtools!
What does the last two columns in the gff mean? Are they the number of counts? Why they are different? I couldn't find the explanation anywhere. Thanks!