ete
ete copied to clipboard
Add option to ignore updating NCBI taxonomy database
This situation applies in an HPC environment and there are multiple concurrent jobs running which are calling ETE3 independently and there has either been an update to the NCBI database or momentary connection interruption which prevents connection to the SQL lite NCBI taxonomy database. Essentially all of the processes then try to update the NCBI taxonomy database simultaneously which then causes them all to start failing. The problem is in the file ncbiquery.py.
self.db = None
self._connect()
if not is_taxadb_up_to_date(self.dbfile):
print('NCBI database format is outdated. Upgrading', file=sys.stderr)
self.update_taxonomy_database(taxdump_file)
It would be great to have as an option to ignore updating the NCBI taxonomy and/or having the process create a lock file for updating the taxonomy database, so that multiple processes can't try to do it simultaneously.
Is it possible to make database version check optional? Could an extra parameter be added to the NCBITaxa class (e.g. ignore_db_ver_check)?