RepeatMasker
RepeatMasker copied to clipboard
N's present in assembly after softmasking
Hello,
After generating a library of repeats based on repeatmodeller and other software like LTRharvest and LTRRETREIVER, I ran the following using the RepeatMasker software.
The assembly I provided does not contain any N's, so I am confused as to why the softmasked genome now has N's using the following script.
perl RepeatMasker/./RepeatMasker -gff -pa 24 -lib RM_Radish.fa consensus2.frby.2.fasta.modified -xsmall -dir Softmask
I get a total of 141 N's in the output assembly. Do you know what the cause of this could be?
I get a total of 141 N's in the output assembly. Do you know what the cause of this could be?
That is unusual. Do the Ns in the masked output match up with the positions of repeats listed in the .out
report file?
Hi Jeb,
I reached out to Robert Hubley via email and sent him the relevant files and he said he would get back to me. I will let you know once I get a response. It's very few N's relative to the total number of masked bases. Here are the steps I used to get to this point:
- RepeatModeler-2.0.1/BuildDatabase -name Radish consensus2.frby.2.fasta
#repbase_radish.fasta taken from repbase website. 2. cp repbase_radish.fasta RepeatModeler-2.0.1/Libraries/RepeatMasker.Lib
-
RepeatModeler-2.0.1/RepeatModeler -database Radish -pa 5
-
genometools/bin/./gt suffixerator -db consensus2.frby.2.fasta -indexname \t Radish_harvest.fa -tis -suf -lcp -des -ssp -sds -dna
-
genometools/bin/./gt ltrharvest -index Radish_harvest.fa > genome.fa.harvest.scn
-
LTR_FINDER_parallel/./LTR_FINDER_parallel -seq consensus2.frby.2.fasta -threads 10 -harvest_out -size 1000000 -time 300
-
cat genome.fa.harvest.scn consensus2.frby.2.fasta.finder.combine.scn > genome.fa.rawLTR.scn
#Sequence names are too long in my genome so I need to remove some header 8. sed 's/|.*$//' consensus2.frby.2.fasta > consensus2.frby.2.fasta.modified
-
LTR_retriever/./LTR_retriever -genome consensus2.frby.2.fasta.modified -inharvest genome.fa.rawLTR.scn -threads 10
-
cat consensus2.frby.2.fasta.modified.LTRlib.redundant.fa RM_24887.MonNov91320352020/consensi.fa.classified > RM_Radish.fa
-
perl RepeatMasker/./RepeatMasker -gff -pa 24 -lib RM_Radish.fa consensus2.frby.2.fasta.modified -xsmall -dir Softmask
Nicolas
On Wed, Jul 14, 2021 at 3:21 PM Jeb Rosen @.***> wrote:
I get a total of 141 N's in the output assembly. Do you know what the cause of this could be?
That is unusual. Do the Ns in the masked output match up with the positions of repeats listed in the .out report file?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/rmhubley/RepeatMasker/issues/118#issuecomment-880218909, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFB63323KNKUGMBVQVBHWJLTXX5XFANCNFSM5AJPWTLQ .
-- Best,
Nicolas Alexandre PhD Candidate, Integrative Biology Whiteman Lab University of California - Berkeley @.*** @.***>