DeepVirFinder
DeepVirFinder copied to clipboard
Refactoring of main loop
Based off @papanikos earlier fork, this pull request goes further to refactor the main loop.
Filtering of sequence length is now possible for short and long sequences, as I found that very long sequences seemed to cause DVF to halt indefiinitely. Rather than the ad-hoc fasta parsing, the main loop now uses biopython (though this could also be replaced). In doing so, the logic which involves batch processing is much simpler. Further, the batch size is now tuneable by the end-user.
Output file handling logic has been refactored for simplicity but otherwise the same functionality.
Reverse complementing sequences is now done in a manner which is very fast and analogous to many existing tools and APIs.
[ed] minor typo corrections
Hi @jessieren and @chaodengusc
I am mentioning you explicitly, hoping you get a personal notification.
Any chance you might review these changes?
- My original effort was a packaging exercise to make the installation more straightforward, along with some basic refactoring.
- @cerebis took it a step further, as described in this PR.
- We can also pin
h5pyas proposed here.