get_homologues icon indicating copy to clipboard operation
get_homologues copied to clipboard

Error: find_COGs

Open Pirxtrurl opened this issue 2 years ago • 5 comments

Hello,

My name is Jorge Val. Currently, I am working as a postdoc at the University of Edinburgh.

image


The only difference in this run was the size of the set. It was 133 large microbial genomes (from 5 Mbps to 11 Mbps).

I try to solve it by myself, but I couldn't.

I check the get_homologues installation and it seems correct:

image

Please, could you help me?

Best regards,

Jorge

Pirxtrurl avatar Oct 17 '22 18:10 Pirxtrurl

Hi @Pirxtrurl , your error message seems to come from this line:

https://github.com/eead-csic-compbio/get_homologues/blob/5ecfd035dddb9157b396324097bd268eab60d6cb/lib/marfil_homology.pm#L2795

The comments in that part of the code indicate that sometimes disk latency might cause trouble as COGs writes large files, often simply re-running sorts things out. However, this might also be a RAM bottleneck problem, for which you might try option -s or a larger computer.

The failed job should leave behind at least three files (cog-edges.txt, all-edges.txt and all.cog.clusters.log), these might help you track down the problem further, let me know how this goes, Bruno

eead-csic-compbio avatar Oct 18 '22 06:10 eead-csic-compbio

Thanks for your quick answer.

I also suspected it was a problem due to the volume of data. I have tried restarting the analysis several times, but it keeps giving me the same error. I guess the algorithm must reach some bottleneck, as you suggest. Unfortunately, I don't have another computer with more capacity.

I have drastically reduced the number of genomes I use as input. I hope this will avoid crashing the analysis. I will also try to repeat the same run by adding the -s option. Perhaps with this option activated, the program can complete the analysis.

Anyways, thanks for your help.

Best regards.

Jorge

Pirxtrurl avatar Oct 18 '22 12:10 Pirxtrurl

If you don't mind sharing your input .gbk files I can try and run it here and see how taht goes, it is still possible that some code needs fixing, Bruno

eead-csic-compbio avatar Oct 19 '22 09:10 eead-csic-compbio

Hello again,

So far, I have tried to reduce the size of my input by about half, and it works correctly. The problem is related to the volume of data to process.

What I am doing now is gradually increasing the volume of genomes in my input, taking advantage of the fact that your program allows adding new genomes to the calculations already done.

Thanks for your offer. If you don't mind, I will send you a private mail.

Best regards,

Jorge

Pirxtrurl avatar Oct 20 '22 10:10 Pirxtrurl

Sure

eead-csic-compbio avatar Oct 20 '22 10:10 eead-csic-compbio

Available RAM for this job was 32GB

eead-csic-compbio avatar Oct 24 '22 06:10 eead-csic-compbio