JohnVollmers
JohnVollmers
a positive list, containing the exact version of the reference -DB used along with all entires that were preiously assessed as "not contaminations" based on that database should be included...
- [x] add function to extract nucleotide references from blast-dbs - [ ] ~~add function to extract protein references from diamond dbs~~ use blast-DBs for diamond also - [ ]...
maybe base nucleotide comparisons on minhash instead of blast searches? Maybe use sourmash for this? [https://sourmash.readthedocs.io/en/latest/](https://sourmash.readthedocs.io/en/latest/)
maybe use 98% aminoacid identity cut-off? proteins that are unique for one species in a genus would still be attributed to that individual species (but only one copy would be...
sometimes, when the gtdb databse is updated, it can happen that the representative genome of a species changes. if a process was started with one database, and then finished with...
implement lock files to prevent two users writing to the same results-folder at the same time. - [ ] lock file for database creation - [ ] lock file for...
ORF calling on eukaryotes functions drastically different than for bacteria. Instead of using Eukaryotic proteins as references, rather run prodigal with prokaryotic and metagenomic settings on reference Eukaryotic genomes (not...
Divide protein blast databases into smaller subsets (similar to nucleotide dbs). possibilities: 1. seperate by component sub-db (gtdb or refseq_eukaryote/virus) 2. seperate into roughly equal numbers of proteins - [...
add workflows to: - [ ] classify all contigs of a large shotgun metagenome (blast contigs in portions so progress can be saved in between) - [ ] reuse shotgun...