RepeatMasker
RepeatMasker copied to clipboard
RepeatProteinMask Runtime
hi, shell likes:
RepeatMasker-open-4-0-9/RepeatMasker/RepeatProteinMask -noLowSimple -pvalue 0.0001 *.fasta
it cost more than 521.6h ,Neither repeatmasker nor repeatmodel are suited to deal with large genomes,they often runs for nearly 20 days and always break off. It wasted much time and sources. Please provide a solution
Originally posted by @pengbo233 in https://github.com/rmhubley/RepeatModeler/issues/39#issuecomment-527704957
Hi,
I moved your issue to the RepeatMasker git repository because RepeatProteinMask is part of that package and not RepeatModeler. RepeatProtein mask is a simple wrapper around blastp and our growing database of TE derived proteins. Not having any details on the size of your input sequence nor the computer you are running on, it's hard to say if 20 days is long or not. Typically in bio-informatics analysis scale is achieved by breaking up the input and running each batch separately on a cluster. There are other codes available for searching protein databases against an input sequences ( Diamond etc ) and could be used instead of RepeatProteinMask with the included protein database.
-R
hello,
computer : 128G 32CPU or 200GB 40CPU
problem:long contig always is killed ,show error
shell like
RepeatMasker-open-4-0-9/RepeatMasker/RepeatMasker
-nolow
-no_is
-norna
-parallel 5
-qq
-lib final.library
contig_longest.fasta
input genome.fasta.contigs: 200Mb final.library: 200Mb(result from repeatmodeler/result/consensi.fa.classified and ltr_finder/filter/LTR.fa.final.library)
shortest contig : 2.8Mb time: 3h status : finish
longest contig :217Mb time:190h status: continue but always be killed ,all my previous efforts wasted
error information like:
wrong1:
wrong2:
I give it 200G ,32CPU,it still show same error
It wasted much time and sources for long contig and it always shut down . Please provide a solution
please ,how to set -frag , how many the number of batches is the best ?
can i split -lib final.library ,like final.library_part1 , final.library_part2, final.library_part3, final.library_part4,which is used to run Repeatmasker. final,cat result.gff of each to the final gff. thanks
I use diamond instead of blastx in proteinmask
Hi @pengbo233 ,
Would you please share your experience about how to use diamond instead of blastx in proteinmask, it very usefull for speed up RepeatMasker, I think.
Best, Kun