phylophlan
phylophlan copied to clipboard
Add option to generate a nucl database from the Uniprot core proteins
Instead of downloading the protein sequences directly from Uniprot, this adds the possibility to retrieve the corresponding nucleotide sequences from ENA via metadata stored in XML format.
It iterates over the same input files that are necessary for the functionality to retrieve amino acid sequences from Uniprot. However, instead of directly downloading the FastA file, it downloads the XML file from the Uniprot server. The XML file is parsed using a XML scheme provided from the Uniprot website, then the ENA accession ids for the nucleotide sequences are extracted and the FastA sequences downloaded.
Thanks Alex for this PR.
I tried running the new version of phylophlan_setup_database.py
adding the xmlschema
package (version 1.10.0 from conda-forge) to my conda env. However, I'm getting the following error:
Traceback (most recent call last):
File "./phylophlan_setup_database.py", line 25, in <module>
import xmlschema
File "/shares/CIBIO-Storage/CM/cmstore/tools/anaconda3/envs/phylophlan-3.0/lib/python3.6/site-packages/xmlschema/__init__.py", line 14, in <module>
from .resources import normalize_url, normalize_locations, fetch_resource, \
File "/shares/CIBIO-Storage/CM/cmstore/tools/anaconda3/envs/phylophlan-3.0/lib/python3.6/site-packages/xmlschema/resources.py", line 23, in <module>
from elementpath import iter_select, XPathContext, XPath2Parser
File "/shares/CIBIO-Storage/CM/cmstore/tools/anaconda3/envs/phylophlan-3.0/lib/python3.6/site-packages/elementpath/__init__.py", line 18, in <module>
from .exceptions import ElementPathError, MissingContextError, \
File "/shares/CIBIO-Storage/CM/cmstore/tools/anaconda3/envs/phylophlan-3.0/lib/python3.6/site-packages/elementpath/exceptions.py", line 12, in <module>
from .tdop import Token
File "/shares/CIBIO-Storage/CM/cmstore/tools/anaconda3/envs/phylophlan-3.0/lib/python3.6/site-packages/elementpath/tdop.py", line 405, in <module>
class Parser(Generic[TK_co], metaclass=ParserMeta):
TypeError: metaclass conflict: the metaclass of a derived class must be a (non-strict) subclass of the metaclasses of all its bases
and I'm not 100% sure how to fix it. Do you have any idea?
Which exact version of Python are you using on your system, Francesco? I get different results for different versions of Python 3.6, but of course not the same one as you.
I have the 3.6.15 from conda-forge (hb7a2778_0_cpython
).
OK, when I create a fresh Python 3.6.15 conda repo and install xmlsearch, I can import it without any issues. I only get one at 3.6.0 itself. I will dig a bit further in the next days what's going on there.
Hi @fasnicar,
I am very sorry for long hiatus. It got lost in my long list of to-dos.
I pulled all the recent changes that you added to v3.0.3 into this PR. I installed the latest version of PhyloPhlAn v3.0.3 via conda/mamba into a new environment using the follow command: mamba create -n phylophlan_uniprot_test -c bioconda phylophlan=3.0.3
Afterwards, I installed the changes of this PR using pip3
: pip3 install -U git+https://github.com/alexhbnr/phylophlan@uniprot_nuclseq
The pip command installed the Python package xmlschema
v2.2.2 and elementpath
v4.0.1. When I ran phylophlan_setup_database -h
, I didn't get any error message. However, conda/mamba automatically pulled Python version 3.11, and not v3.6 for which you saw the error.
Would you have time to check this PR once more on your system?