ete icon indicating copy to clipboard operation
ete copied to clipboard

Add option to ignore updating NCBI taxonomy database

Open jrober84 opened this issue 5 years ago • 1 comments

This situation applies in an HPC environment and there are multiple concurrent jobs running which are calling ETE3 independently and there has either been an update to the NCBI database or momentary connection interruption which prevents connection to the SQL lite NCBI taxonomy database. Essentially all of the processes then try to update the NCBI taxonomy database simultaneously which then causes them all to start failing. The problem is in the file ncbiquery.py.

    self.db = None
    self._connect()

    if not is_taxadb_up_to_date(self.dbfile):
        print('NCBI database format is outdated. Upgrading', file=sys.stderr)
        self.update_taxonomy_database(taxdump_file)

It would be great to have as an option to ignore updating the NCBI taxonomy and/or having the process create a lock file for updating the taxonomy database, so that multiple processes can't try to do it simultaneously.

jrober84 avatar May 20 '20 14:05 jrober84

Is it possible to make database version check optional? Could an extra parameter be added to the NCBITaxa class (e.g. ignore_db_ver_check)?

kbessonov1984 avatar Dec 10 '20 15:12 kbessonov1984