metabolisHMM
metabolisHMM copied to clipboard
The directory of curated metabolic markers could not be found.
Hi, I am running the 'summarize-markers' as
summarize-metabolism --input aquifer-genomes/ --output summary --metadata groups.csv
but I am getting the following error:
#############################################
metabolisHMM v1.4.0
The directory of curated metabolic markers could not be found.
Please either download the markers from https://github.com/elizabethmcd/metabolisHMM/releases/download/v2.0/metabolisHMM_v2.0_markers.tgz and decompress the tarball, or move the directory to where you are running the workflow from.
However, the models exist in curated_markers/metabolic_markers/*hmm
Also, where is the make-heatmap.R
?
Sorry, this is an error on my part. I recently changed the structure of the curated markers folder and how to check if the database was downloaded. I will try to make this fix within the next couple of days and push a new version.
Additionally, you shouldn't need the make-heatmap.R script as all plotting is done within python now. Is there are some part of the tutorial or help menu that still includes this? This is also my fault, I only recently made all plotting within python.
Thank you for testing!
I was asking about the make-heatmap.R
because I am getting the following error when running the search-custom-markers
:
search-custom-markers --input aquifer-genomes/ --output outdir --markers_dir curated_markers/metabolic_markers/ --markers_list curated_markers/list_metabolic-markers --metadata groups.csv --aggregate ON
#############################################
metabolisHMM v1.4.0
Reformatting fasta files...
Running HMM searches using custom marker set...
Parsing all results...
/home/user/py3-venv/lib/python3.7/site-packages/metabolisHMM-2.0-py3.7.egg/EGG-INFO/scripts/search-custom-markers:204: DeprecationWarning: 'U' mode is deprecated
with open(result, "rU") as input:
Plotting results...
Traceback (most recent call last):
File "/home/user/py3-venv/bin/search-custom-markers", line 4, in <module>
__import__('pkg_resources').run_script('metabolisHMM==2.0', 'search-custom-markers')
File "/home/user/py3-venv/lib/python3.7/site-packages/pkg_resources/__init__.py", line 661, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/home/user/py3-venv/lib/python3.7/site-packages/pkg_resources/__init__.py", line 1441, in run_script
exec(code, namespace, namespace)
File "/home/user/py3-venv/lib/python3.7/site-packages/metabolisHMM-2.0-py3.7.egg/EGG-INFO/scripts/search-custom-markers", line 307, in <module>
plot=sns.heatmap(agg, cmap="viridis",xticklabels=xticks, square=True, linewidths=1, linecolor='black', cbar=True, cbar_kws={"shrink": .50})
File "/home/user/py3-venv/lib/python3.7/site-packages/seaborn/matrix.py", line 517, in heatmap
yticklabels, mask)
File "/home/user/py3-venv/lib/python3.7/site-packages/seaborn/matrix.py", line 167, in __init__
cmap, center, robust)
File "/home/user/py3-venv/lib/python3.7/site-packages/seaborn/matrix.py", line 206, in _determine_cmap_params
vmin = np.percentile(calc_data, 2) if robust else calc_data.min()
File "/home/user/py3-venv/lib/python3.7/site-packages/numpy/core/_methods.py", line 32, in _amin
return umr_minimum(a, axis, None, out, keepdims, initial)
ValueError: zero-size array to reduction operation minimum which has no identity
This isn't an R related error, it's an error for plotting with python, which is suggesting there is either an issue with the correct number of HMMs ran and/or your metadata file. What do your markers_list
and metadata files look like?
This is my groups.csv
GCA_001766875.1_ASM176687v1,groupA
GCA_001766905.1_ASM176690v1,groupA
GCA_001766965.1_ASM176696v1,groupB
GCA_001766985.1_ASM176698v1,groupB
GCA_001767145.1_ASM176714v1,groupB
And this is my list_metabolic-markers
list:
acetate_citrate_lyase_aclA.hmm
acetate_citrate_lyase_aclB.hmm
aprA_TIGR02061.hmm
carbon_monoxide_dehydrogenase_coxL_TIGR02416.hmm
carbon_monoxide_dehydrogenase_coxM.hmm
carbon_monoxide_dehydrogenase_coxS.hmm
ccoN_TIGR00780.hmm
ccoO_TIGR00781.hmm
ccoP_TIGR00782.hmm
codh_catalytic_TIGR01702.hmm
codhC_TIGR00316.hmm
codhD_TIGR00381.hmm
coxA_TIGR02891.hmm
coxB_TIGR02866.hmm
cydA_PF01654.hmm
cydB_TIGR00203.hmm
cyoA_TIGR01433.hmm
cyoD_TIGR02847.hmm
cyoE_TIGR01473.hmm
dsrA_TIGR02064.hmm
dsrB_TIGR02066.hmm
dsrD_PF08679.hmm
fae_TIGR03126.hmm
fccB_PF09242.hmm
fdhA_TIGR01591.hmm
fdhB_TIGR01582.hmm
fdhC_TIGR01583.hmm
fdh_thiol_id_TIGR02819.hmm
FeFeHydrogenase_TIGR02512.hmm
FeFeHydrogenase_TIGR04105.hmm
fmtf_TIGR03119.hmm
hydrazine_oxidase_hzoA.hmm
hydrazine_synthase_hzsA.hmm
Hydrogenase_Group_1.hmm
Hydrogenase_Group_2a.hmm
Hydrogenase_Group_2b.hmm
Hydrogenase_Group_3a.hmm
Hydrogenase_Group_3b.hmm
Hydrogenase_Group_3c.hmm
Hydrogenase_Group_3d.hmm
Hydrogenase_Group_4.hmm
madA_TIGR02659.hmm
madB_TIGR02658.hmm
mtmc_TIGR03120.hmm
napA_TIGR01706.hmm
napB_PF03892.hmm
narG_TIGR01580.hmm
narH_TIGR01660.hmm
ndma_methanol_dehydrogenase_TIGR04266.hmm
nifD_TIGR01282.hmm
nifH_TIGR01287.hmm
nifK_TIGR01286.hmm
nirB_TIGR02374.hmm
nirD_TIGR02378.hmm
nirK_TIGR02376.hmm
nitric_oxide_reductase_norB.hmm
nitric_oxide_reductase_norC.hmm
nitrite_oxidoreductase_nxrA.hmm
nitrite_oxidoreductase_nxrB.hmm
nitrite_reductase_nirS.hmm
nosD_TIGR04247.hmm
nosZ_TIGR04246.hmm
nrfA_TIGR03152.hmm
nrfH_TIGR03153.hmm
qoxA_TIGR01432.hmm
rubisco_form_I.hmm
rubisco_form_II.hmm
rubisco_form_III.hmm
rubisco_form_II_III.hmm
rubisco_form_IV.hmm
sat_TIGR00339.hmm
sfh_TIGR02821.hmm
sgdh_TIGR02818.hmm
smdh_TIGR03451.hmm
soxB_TIGR04486.hmm
soxC_TIGR04555.hmm
soxY_TIGR04488.hmm
sulfide_quinone_oxidoreductase_sqr.hmm
sulfur_dioxygenase_sdo.hmm
thiosulfate_reductase_phsA.hmm
It's just a list of all the HMMs that are in curated_markers/metabolic_markers/
Yes these look fine. Are there any results in your outdir
folder, such as the CSV of the HMM stats?
Here are the first two lines of the files in outdir/results
==> cleaned-matrix.csv <==
genome,acetate_citrate_lyase_aclA,acetate_citrate_lyase_aclB,aprA_TIGR02061,carbon_monoxide_dehydrogenase_coxL_TIGR02416,carbon_monoxide_dehydrogenase_coxM,carbon_monoxide_dehydrogenase_coxS,ccoN_TIGR00780,ccoO_TIGR00781,ccoP_TIGR00782,codh_catalytic_TIGR01702,codhC_TIGR00316,codhD_TIGR00381,coxA_TIGR02891,coxB_TIGR02866,cydA_PF01654,cydB_TIGR00203,cyoA_TIGR01433,cyoD_TIGR02847,cyoE_TIGR01473,dsrA_TIGR02064,dsrB_TIGR02066,dsrD_PF08679,fae_TIGR03126,fccB_PF09242,fdhA_TIGR01591,fdhB_TIGR01582,fdhC_TIGR01583,fdh_thiol_id_TIGR02819,FeFeHydrogenase_TIGR02512,FeFeHydrogenase_TIGR04105,fmtf_TIGR03119,hydrazine_oxidase_hzoA,hydrazine_synthase_hzsA,Hydrogenase_Group_1,Hydrogenase_Group_2a,Hydrogenase_Group_2b,Hydrogenase_Group_3a,Hydrogenase_Group_3b,Hydrogenase_Group_3c,Hydrogenase_Group_3d,Hydrogenase_Group_4,madA_TIGR02659,madB_TIGR02658,mtmc_TIGR03120,napA_TIGR01706,napB_PF03892,narG_TIGR01580,narH_TIGR01660,ndma_methanol_dehydrogenase_TIGR04266,nifD_TIGR01282,nifH_TIGR01287,nifK_TIGR01286,nirB_TIGR02374,nirD_TIGR02378,nirK_TIGR02376,nitric_oxide_reductase_norB,nitric_oxide_reductase_norC,nitrite_oxidoreductase_nxrA,nitrite_oxidoreductase_nxrB,nitrite_reductase_nirS,nosD_TIGR04247,nosZ_TIGR04246,nrfA_TIGR03152,nrfH_TIGR03153,qoxA_TIGR01432,rubisco_form_I,rubisco_form_II,rubisco_form_III,rubisco_form_II_III,rubisco_form_IV,sat_TIGR00339,sfh_TIGR02821,sgdh_TIGR02818,smdh_TIGR03451,soxB_TIGR04486,soxC_TIGR04555,soxY_TIGR04488,sulfide_quinone_oxidoreductase_sqr,sulfur_dioxygenase_sdo,thiosulfate_reductase_phsA
GCA_001766905.1_ASM176690v1_genomic,0.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
==> custom-markers-results.csv <==
,acetate_citrate_lyase_aclA,acetate_citrate_lyase_aclB,aprA_TIGR02061,carbon_monoxide_dehydrogenase_coxL_TIGR02416,carbon_monoxide_dehydrogenase_coxM,carbon_monoxide_dehydrogenase_coxS,ccoN_TIGR00780,ccoO_TIGR00781,ccoP_TIGR00782,codh_catalytic_TIGR01702,codhC_TIGR00316,codhD_TIGR00381,coxA_TIGR02891,coxB_TIGR02866,cydA_PF01654,cydB_TIGR00203,cyoA_TIGR01433,cyoD_TIGR02847,cyoE_TIGR01473,dsrA_TIGR02064,dsrB_TIGR02066,dsrD_PF08679,fae_TIGR03126,fccB_PF09242,fdhA_TIGR01591,fdhB_TIGR01582,fdhC_TIGR01583,fdh_thiol_id_TIGR02819,FeFeHydrogenase_TIGR02512,FeFeHydrogenase_TIGR04105,fmtf_TIGR03119,hydrazine_oxidase_hzoA,hydrazine_synthase_hzsA,Hydrogenase_Group_1,Hydrogenase_Group_2a,Hydrogenase_Group_2b,Hydrogenase_Group_3a,Hydrogenase_Group_3b,Hydrogenase_Group_3c,Hydrogenase_Group_3d,Hydrogenase_Group_4,madA_TIGR02659,madB_TIGR02658,mtmc_TIGR03120,napA_TIGR01706,napB_PF03892,narG_TIGR01580,narH_TIGR01660,ndma_methanol_dehydrogenase_TIGR04266,nifD_TIGR01282,nifH_TIGR01287,nifK_TIGR01286,nirB_TIGR02374,nirD_TIGR02378,nirK_TIGR02376,nitric_oxide_reductase_norB,nitric_oxide_reductase_norC,nitrite_oxidoreductase_nxrA,nitrite_oxidoreductase_nxrB,nitrite_reductase_nirS,nosD_TIGR04247,nosZ_TIGR04246,nrfA_TIGR03152,nrfH_TIGR03153,qoxA_TIGR01432,rubisco_form_I,rubisco_form_II,rubisco_form_III,rubisco_form_II_III,rubisco_form_IV,sat_TIGR00339,sfh_TIGR02821,sgdh_TIGR02818,smdh_TIGR03451,soxB_TIGR04486,soxC_TIGR04555,soxY_TIGR04488,sulfide_quinone_oxidoreductase_sqr,sulfur_dioxygenase_sdo,thiosulfate_reductase_phsA
GCA_001766905.1_ASM176690v1_genomic,0.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Ok so it's running the HMMs correctly as least for the search-custom-markers
workflow. A lot of those errors above look like seaborn, numpy, or matplotlib package errors. Can you make sure all of those are installed and what versions there are?
seaborn==0.9.0 numpy==1.18.0 numpydoc==0.9.1 matplotlib==3.1.2
I just pushed a new version to PyPi and you can upgrade with python3 -m pip install metabolisHMM --upgrade
. I still have a feeling this might be something weird with seaborn, however do you also have pandas installed, and what version?
I ran it with the upgraded version, but I am getting the same error:
#############################################
metabolisHMM v2.1
Reformatting fasta files...
Running HMM searches using custom marker set...
Parsing all results...
/home/user/py3-venv/bin/search-custom-markers:204: DeprecationWarning: 'U' mode is deprecated
with open(result, "rU") as input:
Plotting results...
Traceback (most recent call last):
File "/home/user/py3-venv/bin/search-custom-markers", line 307, in <module>
plot=sns.heatmap(agg, cmap="viridis",xticklabels=xticks, square=True, linewidths=1, linecolor='black', cbar=True, cbar_kws={"shrink": .50})
File "/home/user/py3-venv/lib/python3.7/site-packages/seaborn/matrix.py", line 517, in heatmap
yticklabels, mask)
File "/home/user/py3-venv/lib/python3.7/site-packages/seaborn/matrix.py", line 167, in __init__
cmap, center, robust)
File "/home/user/py3-venv/lib/python3.7/site-packages/seaborn/matrix.py", line 206, in _determine_cmap_params
vmin = np.percentile(calc_data, 2) if robust else calc_data.min()
File "/home/user/py3-venv/lib/python3.7/site-packages/numpy/core/_methods.py", line 34, in _amin
return umr_minimum(a, axis, None, out, keepdims, initial, where)
ValueError: zero-size array to reduction operation minimum which has no identity
my pandas
version is pandas==0.25.3
I am having the same issue as sarah872 (the initial"The directory of curated metabolic markers could not be found." issue). I installed via conda. Looking forward to the update and using this awesome workflow!
@morgvevans Can you install version 2.1 with python3 -m pip install metabolisHMM --upgrade
as this issue has been fixed in the new version
@sarah872 if you turn the aggregate option OFF, what happens?
Plotting works then turned off! Although the labels are a little shifted... custom-markers-results-heatmap.pdf
I got this working -- thanks so much for the assistance!
@sarah872 I will try to have a fix for the aggregating option soon. For the shifting of labels, I think the plotting functions are a little manual when trying to format label sizes. I can see if there is a fix for this as well, but the formatting may only work when you have smaller numbers of HMMs to run.
@sarah872 I have been unable to reproduce your error yet. I ran this command:
search-custom-markers --input ../genomes/ --output TEST1 --metadata ../groups.csv --markers_dir ../test_markers/ --markers_list ../markers_list.txt --aggregate ON
Where my groups.csv file looks like:
bacteria00190,Actinobacteria
bacteria00193,Deltaproteobacteria
bacteria00203,Bacteroidetes
bacteria00229,Deltaproteobacteria
bacteria01060v2,Deltaproteobacteria
bacteria23257,Deltaproteobacteria
bacteria23258,Bacteroidetes
bacteria23259,Chloroflexi
bacteria23260,Deltaproteobacteria
bacteria23263,Deltaproteobacteria
bacteria23265,Deltaproteobacteria
bacteria23266,Firmicutes
bacteria23267,Deltaproteobacteria
bacteria23268,Deltaproteobacteria
bacteria23272,Deltaproteobacteria
bacteria23311,Deltaproteobacteria
bacteria23313,Other
bacteria23314,Other
bacteria23315,Other
bacteria23317,Other
bacteria30001,Actinobacteria
bacteria30002,Actinobacteria
bacteria30003,Bacteroidetes
bacteria30004,Bacteroidetes
bacteria30005,Bacteroidetes
bacteria30006,Bacteroidetes
bacteria30007,Chloroflexi
bacteria30008,Firmicutes
bacteria30010,Firmicutes
bacteria30011,Firmicutes
bacteria30012,Firmicutes
bacteria30013,Deltaproteobacteria
bacteria30014,Deltaproteobacteria
bacteria30015,Deltaproteobacteria
bacteria30016,Deltaproteobacteria
bacteria30017,Other
bacteria30018,Deltaproteobacteria
bacteria30019,Deltaproteobacteria
bacteria30020,Other
bacteria30021,Other
bacteria30023,PVC
bacteria30024,Deltaproteobacteria
bacteria30025,Deltaproteobacteria
bacteria30026,PVC
bacteria30027,PVC
bacteria30028,PVC
bacteria30029,PVC
bacteria30030,PVC
bacteria30031,PVC
bacteria30032,PVC
bacteria30033,PVC
bacteria30034,PVC
bacteria30035,PVC
bacteria30036,PVC
And my markers_list.txt file looks like:
napA_TIGR01706.hmm
napB_PF03892.hmm
narG_TIGR01580.hmm
narH_TIGR01660.hmm
ndma_methanol_dehydrogenase_TIGR04266.hmm
nifD_TIGR01282.hmm
nifH_TIGR01287.hmm
nifK_TIGR01286.hmm
nirB_TIGR02374.hmm
nirD_TIGR02378.hmm
nirK_TIGR02376.hmm
nitric_oxide_reductase_norB.hmm
nitric_oxide_reductase_norC.hmm
nitrite_oxidoreductase_nxrA.hmm
nitrite_oxidoreductase_nxrB.hmm
nitrite_reductase_nirS.hmm
nosD_TIGR04247.hmm
nosZ_TIGR04246.hmm
nrfA_TIGR03152.hmm
nrfH_TIGR03153.hmm
I did take a look at your package version numbers, and this could possibly be causing the errors. I changed the package installation requirements for specific versions of the required dependencies. If you could run python3 -m pip uninstall metabolisHMM
and then reinstall it, this could solve the versioning issues. If you do not want to affect your preexisting versions of other packages, put it in an environment.
Let me know if you have any questions or if this still doesn't work.
I first tried reinstalling, but I encountered the following error:
ValueError: numpy.ufunc has the wrong size, try recompiling. Expected 192, got 216
So I updated numpy with pip install numpy --upgrade
, but then I got again the error when plotting (as above). Therefore I installed metabolisHMM
in an environment, but again got the error with numpy:
Traceback (most recent call last):
File "/scratch/metabolisHMM/installaion/bin/search-custom-markers", line 12, in <module>
import pandas as pd
File "/scratch/metabolisHMM/installaion/lib/python3.7/site-packages/pandas/__init__.py", line 26, in <module>
from pandas._libs import (hashtable as _hashtable,
File "/scratch/metabolisHMM/installaion/lib/python3.7/site-packages/pandas/_libs/__init__.py", line 4, in <module>
from .tslib import iNaT, NaT, Timestamp, Timedelta, OutOfBoundsDatetime
File "__init__.pxd", line 872, in init pandas._libs.tslib
ValueError: numpy.ufunc has the wrong size, try recompiling. Expected 192, got 216
There seems to be an error with the installation of numpy??
Also, could you provide all example files you used here?
Can you provide the versions you had previously of the following packages:
- pandas
- biopython
- numpy
- matplotlib
- seaborn
I'll try to make an environment with your versions of the packages and see if I can reproduce the error that way.
The only other files I provided in the example above are protein fasta files of the genomes. The comment above contains the complete contents of my groups.csv
metadata file and the markers list, which are the markers from the curated markers dataset.
Hi @sarah872 and @morgvevans. I'm planning some bug fixes for next week and I wanted to see if the above issue was still a problem? I'll also be trying to make a conda release and therefore consolidating some workflows to make things simpler. Any suggestions about things to improve are welcome! Thanks.
Hi guys, I am also getting a similar error with my MAG dataset. Any idea how to solve them. I have attached s/w version and dependency s/w version along with the error message.
Many thanks in advance
Venkat
Hi @srisvs33 - it looks like a few of your packages are a few versions newer of the versions that work with the workflows, namely seaborn and matplotlib. When running this command, do you still get results? If you turn the aggregate option OFF as suggested above, does the workflow still work? If so, this might be a numpy package version problem that I will need to resolve.