bakta
bakta copied to clipboard
ValueError: Need a Nucleotide or Protein alphabet
Hi, I try to use bakta desktop, which is working fine to some extent (I get gff and all the prediction part works fine), however i got the following error "ValueError: Need a Nucleotide or Protein alphabet" which is connected wit Biopyton, as far as I could find out. The issue is described in hear https://biopython.org/wiki/Alphabet. I hopped that I could introduce same changes to the InsdcIO.py file to overcome the problem, but it is too hard for me. Could you help me to solve the problem and make the desktop version fully functional?
Hi,
thanks for reaching out with this. That's interesting! Could you maybe provide the log file of a verbose run (bakta --verbose ...
) and maybe also a debugging file triggering a reproducible error? And which version of Bakta (bakta --version
) do you use?
Hi,
Thanks for answering.
Please see the attached files - I copied the error message from the terminal (bakta_error.txt), but I cannot attach the full log file (it is too big), so I removed the data analysis part (log_file_start&end_part.txt). I use bakta 1.4.2, which I installed with conda. The error message appears each time I run bakta. I hope it is enough for you.
All the best, Magda
wt., 2 sie 2022 o 12:36 Oliver Schwengers @.***> napisał(a):
Hi, thanks for reaching out with this. That's interesting! Could you maybe provide the log file of a verbose run (bakta --verbose ...) and maybe also a debugging file triggering a reproducible error? And which version of Bakta (bakta --version) do you use?
— Reply to this email directly, view it on GitHub https://github.com/oschwengers/bakta/issues/116#issuecomment-1202312104, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIUGBC6A5O2HRL5VUUCVOFDVXD22VANCNFSM543ECT5Q . You are receiving this because you authored the thread.Message ID: @.***>
-- Pozdrawiam,
M. Guzowska
dr Magdalena Guzowska Katedra Nauk Fizjologicznych, WMW, SGGW ul. Nowoursynowska 159, bud 24, pok. 143 02-776 Warszawa
export annotation results to: [/home/...]
human readable TSV...
GFF3...
INSDC GenBank & EMBL...
Traceback (most recent call last):
File "~/miniconda3/bin/bakta", line 10, in
16:21:38.690 - INFO - UTILS - version=1.4.2 16:21:38.690 - INFO - UTILS - developer: Oliver Schwengers, github.com/oschwengers 16:21:38.690 - INFO - UTILS - command: bakta --db ./db --prefix Salmonella_12S -v --output ../bakta --genus Salmonella --compliant 12S_SEE_cns.fasta 16:21:38.690 - INFO - UTILS - local time: 2022-08-04 16:21:38 16:21:38.695 - INFO - UTILS - machine: type=x86_64, cores=20 16:21:38.696 - INFO - UTILS - system: type=Linux, release=5.14.0-1046-oem 16:21:38.696 - INFO - UTILS - python: version=3.9.13, implementation=CPython 16:21:38.696 - INFO - CONFIG - threads=20 16:21:38.696 - INFO - CONFIG - verbose=True 16:21:38.696 - DEBUG - CONFIG - test parameter db: db_tmp=./db 16:21:38.696 - INFO - CONFIG - database: type=parameter, path=./db 16:21:38.697 - INFO - CONFIG - tmp-path=/tmp/tmp9gsvak_1 16:21:38.697 - INFO - CONFIG - genome-path= 16:21:38.697 - INFO - CONFIG - min_contig_length=1 16:21:38.697 - INFO - CONFIG - prefix=Salmonella_12S 16:21:38.697 - INFO - CONFIG - output-path= 16:21:38.697 - INFO - CONFIG - genus=Salmonella 16:21:38.697 - INFO - CONFIG - species=None 16:21:38.697 - INFO - CONFIG - strain=None 16:21:38.697 - INFO - CONFIG - plasmid=None 16:21:38.697 - INFO - CONFIG - complete=False 16:21:38.697 - INFO - CONFIG - prodigal_tf=None 16:21:38.697 - INFO - CONFIG - translation_table=11 16:21:38.697 - INFO - CONFIG - gram=? 16:21:38.697 - INFO - CONFIG - compliant=True 16:21:38.697 - INFO - CONFIG - compliant mode! min_contig_length=200 16:21:38.697 - INFO - CONFIG - locus=None 16:21:38.697 - INFO - CONFIG - locus-tag=None 16:21:38.697 - INFO - CONFIG - keep_contig_headers=False 16:21:38.697 - INFO - CONFIG - replicon-table=None 16:21:38.697 - INFO - CONFIG - skip-tRNA=False 16:21:38.697 - INFO - CONFIG - skip-tmRNA=False 16:21:38.697 - INFO - CONFIG - skip-rRNA=False 16:21:38.697 - INFO - CONFIG - skip-ncRNA=False 16:21:38.697 - INFO - CONFIG - skip-ncRNA-region=False 16:21:38.697 - INFO - CONFIG - skip-CRISPR=False 16:21:38.697 - INFO - CONFIG - skip-CDS=False 16:21:38.697 - INFO - CONFIG - skip-sORF=False 16:21:38.698 - INFO - CONFIG - skip-gap=False 16:21:38.698 - INFO - CONFIG - skip-ori=False 16:21:38.698 - INFO - DB - detected: major=3, minor=1, date=2022-02-03 16:21:38.861 - INFO - UTILS - dependency: tool=tRNAscan-SE, version=v2.0.9 16:21:38.869 - INFO - UTILS - dependency: tool=aragorn, version=v1.2.41 16:21:38.885 - INFO - UTILS - dependency: tool=cmscan, version=v1.1.4 16:21:38.895 - INFO - UTILS - dependency: tool=pilercr, version=v1.6.0 16:21:38.917 - INFO - UTILS - dependency: tool=prodigal, version=v2.6.3 16:21:38.928 - INFO - UTILS - dependency: tool=amrfinder, version=v3.10.30 16:21:38.970 - INFO - UTILS - dependency: tool=hmmsearch, version=v3.3.2 16:21:38.994 - INFO - UTILS - dependency: tool=diamond, version=v2.0.15 16:21:39.095 - INFO - UTILS - dependency: tool=blastn, version=v2.13.0 16:21:39.239 - INFO - FASTA - imported: id=12S_SEE_cns, length=4857492, description=, genomic=True, dna=True 16:21:39.241 - INFO - FASTA - imported: id=12S_SEE_cns, length=91136, description=, genomic=True, dna=True 16:21:39.241 - INFO - MAIN - imported sequences=2 16:21:39.241 - INFO - UTILS - qc: revised sequence: id=contig_1, orig-id=12S_SEE_cns, type=contig, complete=False, topology=linear, name=, description='[organism=Salmonella] [gcode=11]', orig-description='' 16:21:39.241 - INFO - UTILS - qc: revised sequence: id=contig_2, orig-id=12S_SEE_cns, type=contig, complete=False, topology=linear, name=, description='[organism=Salmonella] [gcode=11]', orig-description='' 16:21:39.241 - INFO - FASTA - write genome sequences: path=/tmp/tmp9gsvak_1/contigs.fna, description=False, wrap=False 16:21:39.247 - DEBUG - MAIN - start tRNA prediction .....
16:29:03.823 - DEBUG - ANNOTATION - filter features on contig: contig_2 16:29:03.823 - DEBUG - MAIN - start feature selection and creation of locus tags 16:29:03.834 - INFO - UTILS - generated sequence tag prefix: prefix=FOPLCOMPJC, length=10, MD5=F895C8693C5B657EAB3DE2A67E7D6D48 16:29:03.839 - INFO - MAIN - selected features=4920 16:29:03.849 - INFO - UTILS - generated sequence tag prefix: prefix=FOPLCO, length=6, MD5=F895C8693C5B657EAB3DE2A67E7D6D48 16:29:03.849 - INFO - MAIN - locus tag prefix=FOPLCO 16:29:03.854 - INFO - UTILS - genome-size=4948628 16:29:03.892 - INFO - UTILS - GC=0.526 16:29:03.892 - INFO - UTILS - N=0.102 16:29:03.892 - INFO - UTILS - N50=4857492 16:29:03.894 - INFO - UTILS - coding-ratio=1.109 16:29:03.901 - INFO - TSV - write tsv: path=~/Salmonella_12S.tsv 16:29:03.921 - INFO - GFF - write GFF3: path=~/Salmonella_12S.gff3 16:29:04.017 - DEBUG - INSDC - prepare: genbank=~/Salmonella_12S.gbff, embl=[/home/...]/Salmonella_12S.embl 16:29:04.149 - INFO - INSDC - write GenBank: path=~/Salmonella_12S.gbff 16:29:04.158 - INFO - MAIN - removed tmp dir: /tmp/tmp9gsvak_1
Thanks! This is indeed due to BioPython and I guess it relates to a recent deprecation of the Bio.Alphabet
module within BioPython: https://github.com/biopython/biopython/issues/3156
Actually, conda should take care that the installed BioPython version is >=1.78
. But could you please double check it to be sure? (of course within your activated conda environment)
$ python3
import Bio
print(Bio.__version__)
Indeed, the causing line 631 of Bio/SeqIO/InsdcIO.py
occurs in BioPython 1.74
:
https://github.com/biopython/biopython/blob/8677e94d3d4ca7374e12d66a05e0dd3732168cce/Bio/SeqIO/InsdcIO.py#L631
Could therefore please update BioPython to 1.78
or 1.79
and try again?
@kitka2000 , gentle ping Did this solve the issue?
Hi, I sent you the message below a couple of days ago but it seems that you didn’t get it. I think I should reinstall the bakta in the separate environment. Do you think it could help?
W dniu wt., 9.08.2022 o 10:30 Magdalena Guzowska @.***> napisał(a):
Hi, sorry for the late response.
Somehow I have a real mess in my installations. When I check for the Biopython versions it states that I have 1.73. So I tried to update with pip all the packages which are outdated, and I got plenty of errors:
$ pip list --outdated Package Version Latest Type
alive-progress 1.6.2 2.4.1 wheel biopython 1.73 1.79 wheel ncbi-genome-download 0.2.8 0.3.1 wheel numpy 1.22.4 1.23.1 wheel
$ pip3 list --outdated --format=freeze | grep -v '^-e' | cut -d = -f 1 | xargs -n1 pip3 install -U
Error list: ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. bakta 1.4.2 requires alive-progress==1.6.2, but you have alive-progress 2.4.1 which is incompatible. bakta 1.4.2 requires biopython>=1.78, but you have biopython 1.73 which is incompatible.
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. seqsero2 1.2.1 requires biopython==1.73, but you have biopython 1.79 which is incompatible. bakta 1.4.2 requires alive-progress==1.6.2, but you have alive-progress 2.4.1 which is incompatible.
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. bakta 1.4.2 requires alive-progress==1.6.2, but you have alive-progress 2.4.1 which is incompatible.
All the best, Magda
W dniu wt., 9.08.2022 o 10:30 Magdalena Guzowska @.***> napisał(a):
Hi, sorry for the late response.
Somehow I have a real mess in my installations. When I check for the Biopython versions it states that I have 1.73. So I tried to update with pip all the packages which are outdated, and I got plenty of errors:
$ pip list --outdated Package Version Latest Type
alive-progress 1.6.2 2.4.1 wheel biopython 1.73 1.79 wheel ncbi-genome-download 0.2.8 0.3.1 wheel numpy 1.22.4 1.23.1 wheel
$ pip3 list --outdated --format=freeze | grep -v '^-e' | cut -d = -f 1 | xargs -n1 pip3 install -U
Error list: ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. bakta 1.4.2 requires alive-progress==1.6.2, but you have alive-progress 2.4.1 which is incompatible. bakta 1.4.2 requires biopython>=1.78, but you have biopython 1.73 which is incompatible.
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. seqsero2 1.2.1 requires biopython==1.73, but you have biopython 1.79 which is incompatible. bakta 1.4.2 requires alive-progress==1.6.2, but you have alive-progress 2.4.1 which is incompatible.
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. bakta 1.4.2 requires alive-progress==1.6.2, but you have alive-progress 2.4.1 which is incompatible.
Should I reinstall bakta in a different environment to avoid these errors?
All the best, Magda
pt., 5 sie 2022 o 10:00 Oliver Schwengers @.***> napisał(a):
Indeed, the causing line 631 of Bio/SeqIO/InsdcIO.py occurs in BioPython 1.74:
https://github.com/biopython/biopython/blob/8677e94d3d4ca7374e12d66a05e0dd3732168cce/Bio/SeqIO/InsdcIO.py#L631
Could therefore please update BioPython to 1.78 or 1.79 and try again?
— Reply to this email directly, view it on GitHub https://github.com/oschwengers/bakta/issues/116#issuecomment-1206161275, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIUGBC3GOW43HYPOEKHOTJLVXTC3FANCNFSM543ECT5Q . You are receiving this because you authored the thread.Message ID: @.***>
-- Pozdrawiam,
M. Guzowska
dr Magdalena Guzowska Katedra Nauk Fizjologicznych, WMW, SGGW ul. Nowoursynowska 159, bud 24, pok. 143 02-776 Warszawa
-- Pozdrawiam,
M. Guzowska
dr Magdalena Guzowska Katedra Nauk Fizjologicznych, WMW, SGGW ul. Nowoursynowska 159, bud 24, pok. 143 02-776 Warszawa
Oh, indeed I've somehow overseen your msg, my fault - sorry! These version conflicts often happen within overcrowded environments where several tools require different versions of the same 3rd party dependency. In this case a fresh Conda environment is the easiest way to go.
So, yes please just try to install Bakta in a fresh environment. This should help to solve this issue. Best regards!
I will do it:) thank you once again:)
All the best, Magda:)
W dniu pon., 15.08.2022 o 10:50 Oliver Schwengers @.***> napisał(a):
Oh, indeed I've somehow overseen your msg, my fault - sorry! These version conflicts often happen within overcrowded environments where several tools require different versions of the same 3rd party dependency. In this case a fresh Conda environment is the easiest way to go.
So, yes please just try to install Bakta in a fresh environment. This should help to solve this issue. Best regards!
— Reply to this email directly, view it on GitHub https://github.com/oschwengers/bakta/issues/116#issuecomment-1214766587, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIUGBCZIBC7CCIYASHRP6U3VZIAG5ANCNFSM543ECT5Q . You are receiving this because you were mentioned.Message ID: @.***>
-- Pozdrawiam,
M. Guzowska
dr Magdalena Guzowska Katedra Nauk Fizjologicznych, WMW, SGGW ul. Nowoursynowska 159, bud 24, pok. 143 02-776 Warszawa
Hi @kitka2000 , did a fresh Conda environment help to solve this issue?
Hi,
Yes it works perfectly well:) Thank you once more for helping me:) I think I’m a bit more conscious conda user right now, thanks to you:) All the best, Magda
W dniu czw., 25.08.2022 o 08:47 Oliver Schwengers @.***> napisał(a):
Hi @kitka2000 https://github.com/kitka2000 , did a fresh Conda environment help to solve this issue?
— Reply to this email directly, view it on GitHub https://github.com/oschwengers/bakta/issues/116#issuecomment-1226846427, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIUGBCZN37OUKDKUWVLXLDDV24JI3ANCNFSM543ECT5Q . You are receiving this because you were mentioned.Message ID: @.***>
-- Pozdrawiam,
M. Guzowska
dr Magdalena Guzowska Katedra Nauk Fizjologicznych, WMW, SGGW ul. Nowoursynowska 159, bud 24, pok. 143 02-776 Warszawa
You're very welcome - glad to hear! All the best, Oliver
Hi, sorry for the late response.
Somehow I have a real mess in my installations. When I check for the Biopython versions it states that I have 1.73. So I tried to update with pip all the packages which are outdated, and I got plenty of errors:
$ pip list --outdated Package Version Latest Type
alive-progress 1.6.2 2.4.1 wheel biopython 1.73 1.79 wheel ncbi-genome-download 0.2.8 0.3.1 wheel numpy 1.22.4 1.23.1 wheel
$ pip3 list --outdated --format=freeze | grep -v '^-e' | cut -d = -f 1 | xargs -n1 pip3 install -U
Error list: ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. bakta 1.4.2 requires alive-progress==1.6.2, but you have alive-progress 2.4.1 which is incompatible. bakta 1.4.2 requires biopython>=1.78, but you have biopython 1.73 which is incompatible.
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. seqsero2 1.2.1 requires biopython==1.73, but you have biopython 1.79 which is incompatible. bakta 1.4.2 requires alive-progress==1.6.2, but you have alive-progress 2.4.1 which is incompatible.
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. bakta 1.4.2 requires alive-progress==1.6.2, but you have alive-progress 2.4.1 which is incompatible.
Should I reinstall bakta in a different environment to avoid these errors?
All the best, Magda
pt., 5 sie 2022 o 10:00 Oliver Schwengers @.***> napisał(a):
Indeed, the causing line 631 of Bio/SeqIO/InsdcIO.py occurs in BioPython 1.74:
https://github.com/biopython/biopython/blob/8677e94d3d4ca7374e12d66a05e0dd3732168cce/Bio/SeqIO/InsdcIO.py#L631
Could therefore please update BioPython to 1.78 or 1.79 and try again?
— Reply to this email directly, view it on GitHub https://github.com/oschwengers/bakta/issues/116#issuecomment-1206161275, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIUGBC3GOW43HYPOEKHOTJLVXTC3FANCNFSM543ECT5Q . You are receiving this because you authored the thread.Message ID: @.***>
-- Pozdrawiam,
M. Guzowska
dr Magdalena Guzowska Katedra Nauk Fizjologicznych, WMW, SGGW ul. Nowoursynowska 159, bud 24, pok. 143 02-776 Warszawa
Hi Magda,
since Bakta requires 1.6.2
of alive-progress
, you cannot update it to 2.4.1
w/o breaking Bakta's dependencies. Therefore, you should either keep 1.6.2
or install Bakta in a new environment.
BTW, it's a bit tricky to update the Python dependencies within a Conda env by using Pip
. Technically, this is absolutely OK - but by doing so you bypass Conda as the initial package manager. I'd recommend to either install Bakta in a new env or to update via Conda/Pip all packages that can be updated w/o breaking any dependencies.
Hi Oliver,
Thanks for the tip!
Best, Magda
W dniu wt., 11.10.2022 o 12:35 Oliver Schwengers @.***> napisał(a):
Hi Magda, since Bakta requires 1.6.2 of alive-progress, you cannot update it to 2.4.1 w/o breaking Bakta's dependencies. Therefore, you should either keep 1.6.2 or install Bakta in a new environment.
BTW, it's a bit tricky to update the Python dependencies within a Conda env by using Pip. Technically, this is absolutely OK - but by doing so you bypass Conda as the initial package manager. I'd recommend to either install Bakta in a new env or to update via Conda/Pip all packages that can be updated w/o breaking any dependencies.
— Reply to this email directly, view it on GitHub https://github.com/oschwengers/bakta/issues/116#issuecomment-1274480133, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIUGBC2IPMN52MGJRSKWFVDWCU7GPANCNFSM543ECT5Q . You are receiving this because you were mentioned.Message ID: @.***>
-- Pozdrawiam,
M. Guzowska
dr Magdalena Guzowska Katedra Nauk Fizjologicznych, WMW, SGGW ul. Nowoursynowska 159, bud 24, pok. 143 02-776 Warszawa