Roary
Roary copied to clipboard
Cant open file: _clustered.clstr
Hello, running Roary/1.13.0
we have a problem running roary, it stops with the following error message:
Iteratively run cd-hit
Cant open file: _clustered.clstrParallel all against all blast
after digging in the code I noticed
- that Roary.pm does not test the cd-hit command execution return code.
- that cd-hit failure is due to memory requestedis too low. see when I run th ecd-hit command mannualy
[gensoft@cc118a2a4dd9 CS_pour_ED]$ /opt/gensoft/exe/cd-hit/4.6.1/bin/cd-hit -i _combined_files -o _clustered -T 32 -M 2916 -g 1 -s 1 ^C 256 -c 1
[gensoft@cc118a2a4dd9 CS_pour_ED]$ cd test234/
[gensoft@cc118a2a4dd9 test234]$ /opt/gensoft/exe/cd-hit/4.6.8/bin/cd-hit -i _combined_files -o _clustered -T 32 -M 2916 -g 1 -s 1 -d 256 -c 1
================================================================
Program: CD-HIT, V4.7 (+OpenMP), Jul 12 2021, 08:06:52
Command: /opt/gensoft/exe/cd-hit/4.6.8/bin/cd-hit -i
_combined_files -o _clustered -T 32 -M 2916 -g 1 -s 1
-d 256 -c 1
Started: Mon Jul 12 11:57:14 2021
================================================================
Output
----------------------------------------------------------------
total seq: 956806
longest and shortest : 4323 and 29
Total letters: 303089936
Sequences have been sorted
Approximated minimal memory consumption:
Sequence : 426M
Buffer : 32 X 172M = 5516M
Table : 2 X 80M = 161M
Miscellaneous : 12M
Total : 6117M
Fatal Error:
not enough memory, please set -M option greater than 6217
Program halted !!
[gensoft@cc118a2a4dd9 test234]$ echo $?
1
I'm not enough perl fluent to dig further. hope this helps.
NB for inforamtion here's the roary -a output
[gensoft@cc118a2a4dd9 test234]$ roary -a
Please cite Roary if you use any of the results it produces:
Andrew J. Page, Carla A. Cummins, Martin Hunt, Vanessa K. Wong, Sandra Reuter, Matthew T. G. Holden, Maria Fookes, Daniel Falush, Jacqueline A. Keane, Julian Parkhill,
"Roary: Rapid large-scale prokaryote pan genome analysis", Bioinformatics, 2015 Nov 15;31(22):3691-3693
doi: http://doi.org/10.1093/bioinformatics/btv421
Pubmed: 26198102
2021/07/12 12:01:39 Looking for 'Rscript' - found /opt/gensoft/exe/R/3.6.2/bin/Rscript
2021/07/12 12:01:39 Determined Rscript version is 3.6
2021/07/12 12:01:39 Looking for 'awk' - found /usr/bin/awk
2021/07/12 12:01:39 Looking for 'bedtools' - found /opt/gensoft/exe/bedtools/2.29.2/bin/bedtools
2021/07/12 12:01:39 Determined bedtools version is 2.29
2021/07/12 12:01:39 Looking for 'blastp' - found /opt/gensoft/exe/blast+/2.10.0/bin/blastp
2021/07/12 12:01:39 Determined blastp version is 2.10.0
2021/07/12 12:01:39 Looking for 'grep' - found /usr/bin/grep
2021/07/12 12:01:39 Optional tool 'kraken' not found in your $PATH
2021/07/12 12:01:39 Optional tool 'kraken-report' not found in your $PATH
2021/07/12 12:01:39 Looking for 'mafft' - found /opt/gensoft/exe/mafft/7.453/bin/mafft
2021/07/12 12:01:40 Determined mafft version is 7.453
2021/07/12 12:01:40 Looking for 'makeblastdb' - found /opt/gensoft/exe/blast+/2.10.0/bin/makeblastdb
2021/07/12 12:01:40 Determined makeblastdb version is 2.10.0
2021/07/12 12:01:40 Looking for 'mcl' - found /opt/gensoft/exe/mcl/14-137/bin/mcl
2021/07/12 12:01:40 Determined mcl version is 14-137
2021/07/12 12:01:40 Looking for 'parallel' - found /opt/gensoft/exe/parallel/20200222/bin/parallel
2021/07/12 12:01:40 Determined parallel version is 20200222
2021/07/12 12:01:40 Looking for 'prank' - found /opt/gensoft/exe/prank/170427/bin/prank
2021/07/12 12:01:40 Determined prank version is 170427
2021/07/12 12:01:40 Looking for 'sed' - found /usr/bin/sed
2021/07/12 12:01:40 Looking for 'cd-hit' - found /opt/gensoft/exe/cd-hit/4.6.8/bin/cd-hit
2021/07/12 12:01:40 Determined cd-hit version is 4.7
2021/07/12 12:01:40 Looking for 'FastTree' - found /opt/gensoft/exe/FastTree/2.1.11/bin/FastTree
2021/07/12 12:01:40 Determined FastTree version is 2.1
2021/07/12 12:01:40 Roary version 1.7.7
regards
Eric
back on this topic.
problem is the memory computation performed by Roary::External::Cdhit
which is wrong regarding the new cd-hit memory estimation
see how version cd-hit performed the memory estimation
size_t SequenceDB::MinimalMemory( int frag_no, int bsize, int T, const Options & options, size_t extra )
{
int N = sequences.size();
int F = frag_no < MAX_TABLE_SEQ ? frag_no : MAX_TABLE_SEQ;
size_t mem_need = 0;
size_t mem, mega = 1000000;
int table = T > 1 ? 2 : 1;
printf( "\nApproximated minimal memory consumption:\n" );
mem = N*sizeof(Sequence) + total_desc + N + extra;
if( options.store_disk == false ) mem += total_letter + N;
printf( "%-16s: %zuM\n", "Sequence", mem/mega );
mem_need += mem;
mem = bsize;
printf( "%-16s: %i X %zuM = %zuM\n", "Buffer", T, mem/mega, T*mem/mega );
mem_need += T*mem;
mem = F*(sizeof(Sequence*) + sizeof(IndexCount)) + NAAN*sizeof(NVector<IndexCount>);
printf( "%-16s: %i X %zuM = %zuM\n", "Table", table, mem/mega, table*mem/mega );
mem_need += table*mem;
mem = sequences.capacity()*sizeof(Sequence*) + N*sizeof(int);
mem += Comp_AAN_idx.size()*sizeof(int);
printf( "%-16s: %zuM\n", "Miscellaneous", mem/mega );
mem_need += mem;
printf( "%-16s: %zuM\n\n", "Total", mem_need/mega );
if(options.max_memory and options.max_memory < mem_need + 50*table ){
char msg[200];
sprintf( msg, "not enough memory, please set -M option greater than %zu\n",
50*table + mem_need/mega );
bomb_error(msg);
}
return mem_need;
}
so just take in account number of characters in input file (_combined_files
) as $memory_required = -s $filename;
does is no longer sufficiant.
regards
Eric
I finally hacked lib/Bio/Roary/External/Cdhit.pm
to force memory to unlimited ;-) harsh but functional
--- lib/Bio/Roary/External/Cdhit.pm.ori 2021-07-16 08:37:29.333069603 +0000
+++ lib/Bio/Roary/External/Cdhit.pm 2021-07-16 08:36:11.646064928 +0000
@@ -58,7 +58,10 @@
{
my ($self) = @_;
my $memory_to_cdhit = int($self->memory_in_mb *0.9);
- return $memory_to_cdhit;
+# return $memory_to_cdhit;
+# memory estimation is wrong
+# force -M to 0. fix https://github.com/sanger-pathogens/Roary/issues/539
+ return 0;
}
sub clusters_filename
This thread saved me a ton of time. Thanks, @EricDeveaud !