paladin icon indicating copy to clipboard operation
paladin copied to clipboard

command "paladin prepare -r2" throwing error related to memory ?

Open jaidevjoshi83 opened this issue 5 years ago • 7 comments

Hi, Is there any way/hack to run this particular part without encountering this error given below, if you have less memory, in my case 256GB only. Or can I use the pre-prepared or pre-indexed database?

Error: "Constructing BWT for the packed sequence... [is_bwt] Failed to allocate 482146077304 bytes at is.c line 212: Cannot allocate memory"

Kindly suggest.

jaidevjoshi83 avatar Mar 14 '19 15:03 jaidevjoshi83

Hello, I get the same error when attempting to index the RVDb prot database. The protein FASTAs are renamed to protein.faa.gz and when I run paladin index -r3 protein.faa.gz, I get the following:

[M::command_index] Translating protein sequence...0.00 sec
[M::command_index] Packing protein sequence... 93.97 sec
[M::command_index] Constructing BWT for the packed sequence... [is_bwt] Failed to allocate 134975224056 bytes at is.c line 212: Cannot allocate memory

Although the protein.faa.gz is large (461Mb), I am working on a large cluster, and am surprised to encounter this problem. Many thanks for any help you can provide.

Dave

davidfbibby avatar Apr 11 '22 12:04 davidfbibby

Hi @davidfbibby - to double check, I just clustered the latest revision of the clustered RVDB (around 3.1 GB of amino acids uncompressed) while profiling the memory usage. The maximum resident size during indexing for this reference is 56GB, but as can be seen above, it actually allocates a larger amount to work in (128GB) so you'll need at least that much (system memory and/or job constraint wise) to complete the indexing process. Does your system have that much memory?

ToniWestbrook avatar Apr 11 '22 20:04 ToniWestbrook

Hi, I was using the unclustered dataset, which is over 8Gb! Maybe I should try to use the clustered version... I'm not sure about my available memory tbh, but if the clustered version fails, I'll enquire.

Thanks for the quick response,

Dave

davidfbibby avatar Apr 12 '22 09:04 davidfbibby

Another question - on https://rvdb-prot.pasteur.fr/, it is only the unclustered dataset that I can find.

davidfbibby avatar Apr 12 '22 09:04 davidfbibby

~~Here's the link to the group that maintains the clustered RVDB: https://rvdb.dbi.udel.edu/ (that has both the clustered and unclustered references available).~~ Indexing the unclustered DB would need significantly more memory, so it would be good to use the clustered first if that works okay for your purposes. Hope that helps

ToniWestbrook avatar Apr 12 '22 16:04 ToniWestbrook

Apologies, that's a nucleotide version of the reference at that link! I totally missed that when I downloaded it yesterday. I'll take a look around for a clustered version of the protein database - if not, you may have to cluster it yourself to fit into memory. Sorry again

ToniWestbrook avatar Apr 12 '22 16:04 ToniWestbrook

Ooof. I don't fancy clustering them. I'll see if I can get more memory to allocate... Thanks again for the quick responses.

Dave

davidfbibby avatar Apr 13 '22 12:04 davidfbibby