RepeatMasker icon indicating copy to clipboard operation
RepeatMasker copied to clipboard

RepeatProteinMask Runtime

Open rmhubley opened this issue 4 years ago • 6 comments

hi, shell likes:

RepeatMasker-open-4-0-9/RepeatMasker/RepeatProteinMask -noLowSimple -pvalue 0.0001 *.fasta

it cost more than 521.6h ,Neither repeatmasker nor repeatmodel are suited to deal with large genomes,they often runs for nearly 20 days and always break off. It wasted much time and sources. Please provide a solution

Originally posted by @pengbo233 in https://github.com/rmhubley/RepeatModeler/issues/39#issuecomment-527704957

rmhubley avatar Sep 04 '19 16:09 rmhubley

Hi,

I moved your issue to the RepeatMasker git repository because RepeatProteinMask is part of that package and not RepeatModeler. RepeatProtein mask is a simple wrapper around blastp and our growing database of TE derived proteins. Not having any details on the size of your input sequence nor the computer you are running on, it's hard to say if 20 days is long or not. Typically in bio-informatics analysis scale is achieved by breaking up the input and running each batch separately on a cluster. There are other codes available for searching protein databases against an input sequences ( Diamond etc ) and could be used instead of RepeatProteinMask with the included protein database.

-R

rmhubley avatar Sep 04 '19 16:09 rmhubley

hello,

computer : 128G 32CPU or 200GB 40CPU
problem:long contig always is killed ,show error

shell like RepeatMasker-open-4-0-9/RepeatMasker/RepeatMasker -nolow -no_is -norna -parallel 5 -qq -lib final.library
contig_longest.fasta

input genome.fasta.contigs: 200Mb final.library: 200Mb(result from repeatmodeler/result/consensi.fa.classified and ltr_finder/filter/LTR.fa.final.library)

shortest contig : 2.8Mb time: 3h status : finish longest contig :217Mb time:190h status: continue but always be killed ,all my previous efforts wasted error information like: wrong1:
image wrong2:
image

I give it 200G ,32CPU,it still show same error

It wasted much time and sources for long contig and it always shut down . Please provide a solution

pengbo233 avatar Sep 05 '19 01:09 pengbo233

please ,how to set -frag , how many the number of batches is the best ?

pengbo233 avatar Sep 05 '19 04:09 pengbo233

can i split -lib final.library ,like final.library_part1 , final.library_part2, final.library_part3, final.library_part4,which is used to run Repeatmasker. final,cat result.gff of each to the final gff. thanks

pengbo233 avatar Sep 05 '19 06:09 pengbo233

I use diamond instead of blastx in proteinmask

pengbo233 avatar Sep 17 '19 03:09 pengbo233

Hi @pengbo233 ,

Would you please share your experience about how to use diamond instead of blastx in proteinmask, it very usefull for speed up RepeatMasker, I think.

Best, Kun

xiekunwhy avatar Nov 12 '21 07:11 xiekunwhy