PHANOTATE
PHANOTATE copied to clipboard
Continuously increasing RAM demand
Hi Katelyn,
thanks for PHANOTATE, great tool!
I'm using it for larger sets of viral contigs, and noticed that it seems to not free up RAM between contigs. On my current set with 100+ sequence, towards the end, it needs >5GB RAM. Obviously, I can split the input file to get around that, but I'd also suspect, it wouldn't be too difficult to clean the RAM after a contig has been processed. Might make sense as an improvement for future versions.
Cheers Thomas
Hm, that is strange. In the current version I reset everything between contigs (which I probably shouldn't do since contigs in the same file should probably be treated as the same genome by default; and to treat them as separate genomes use a --meta flag).
I wonder if pythons automatic garbage collector isn't working correctly between contigs to clear old memory. Or perhaps I have a memory leak somewhere.
Thanks for info, I will try to track down the cause.
If you have the time, can you check whether the issue happens with my newer code? I have been developing a 2+ version. The changes in 2.0 alpha are mostly only code optimizations (it should run in half the time), but I did have to make some changes to the way ORFs and gaps/overlaps are scored, and I haven't dialed them in yet, so output might be slightly different. To run the develop branch you can manually install via:
git clone https://github.com/deprekate/PHANOTATE.git
cd PHANOTATE
git checkout develop
python3 setup.py install
phanotate.py tests/NC_001416.1.fasta
Interesting thought regarding --meta
flag. Makes sense to treat contigs from the same genome together - especially shorter ones could profit from any "training" performed also on larger ones. Not sure though if people usually split their phage genomes (or even bacterial bins) into separate files after binning. But if properly documented, the definitely could.
I tried to run the new code. It worked on the test data, but then failed on may contigs with the following error:
phanotate.py -f tabular pt-loci-r1.fna > pt-loci-r1-phanotate-v2.0.tsv
Traceback (most recent call last):
File "/home/thackl/software//PHANOTATE/PHANOTATE/phanotate.py", line 47, in <module>
contig_features.add_feature( trna )
File "/home/thackl/.local/lib/python3.8/site-packages/phanotate-2.0-py3.8-linux-x86_64.egg/phanotate/features.py", line 100, in add_feature
self.feature_at[ feature.as_edge()[:2] ] = feature
AttributeError: 'tRNA' object has no attribute 'as_edge'
Yeah, the problem with making it default is that most people won't read the fulls docs, and so will run MAG files through as the same genome, which will skew the results.
The tRNA error should be fixed, I forgot to push a change I made to the code.
Yeah, users ;). You could enforce a choice with a required mode argument, something like phanotate meta [opts] infile
/ phanotate single [opts] infile
, with the former going contig-wise and the letter expecting a single genome with 1+ contigs...
Caught another error...
phanotate.py -f tabular pt-loci-r1.fna > pt-loci-r1-phanotate-v2.0.tsv
Traceback (most recent call last):
File "/home/thackl/software//PHANOTATE/phanotate.py", line 84, in <module>
shortest_path = fz.get_path(source= contig_features.source_node(), target= contig_features.target_node())[1:-1]
ValueError: Graph contains negative weight cycle
Ah, that error is caused by the weights not being dialed in yet. I will see if I can get the weights sorted out, and/or track down the cause of the RAM usage. Thanks so for you help and feedback.
Sure. Happy to help. Debugging my own code as we speak. Let me know if you want me to give it another try
The negative weight cycle is caused when the cost to go backwards via an overlap is not enough in relation to a really good gene (in this case most likely a tRNA), so during the path solving step, the Bellman-Ford will go through that gene, then do a backwards overlap, then back through the same gene, infinite times.
I reduced the tRNA from -20 to -1, and pushed the change to github, which should prevent the infinite loop for now
Hmm, I still get the negative weight cycle error. I'm currently trying to find out, which sequence exactly is causing the error. If I find it and it would help, I can send it to you.
OK, now it is getting bizarre. When I split the file into one file per seq, and run phanotate on each - everything runs through fine. But if I run phanotate on the file with all sequences, I get the error after a few sequences... Any ideas?
If it would help, I can share the data, assuming you would handle it confidentially
Ah, one of the improvements I made with version 2 is moving the ORF to ORF connections creation step to Cpython. It looks like I forgot to empty the data structure between contigs. I will have to get that fixed.
Hi, I am also running the dev branch because the master doesn't work on some of my phage sequences. I also get the error: ValueError: Graph contains negative weight cycle Is there anything that can be done?
The "negative weight cycle" error is the reason why I have not pushed this version 2 to the main. I mentioned in this version I had to simplify the weighting of the ORFs but kept the tRNAs the same weight (-20). If I drop the weight to -1 it should run fine, but I have to tune it properly, which I havent done. One reason is that I dont have a contig that generates the error.
Are you able to share the contig that causes the error?
Unfortunately I can't share the contigs. I am pretty sure there are public sequences that would have the error, I just don't have the capacity to analyze them until I hit the error.
Hi, I am also running the dev branch because the master doesn't work on some of my phage sequences. I also get the error: ValueError: Graph contains negative weight cycle Is there anything that can be done?
I am revisiting this issue, can you clarify if you got the 'negative cycle' error with the develop or main branch? If it was only the dev branch, what was the error that caused the main branch not to work on some phages?
I also set the weight of tRNAs to 0, so neither beneficial or penalized, with the idea that I will add ALL of them back in post processing. Can you test out the develop branch on the contig that gave you the error?
git clone https://github.com/deprekate/PHANOTATE.git
git checkout develop
pip install ../PHANOTATE