bakta icon indicating copy to clipboard operation
bakta copied to clipboard

ValueError: Need a Nucleotide or Protein alphabet

Open kitka2000 opened this issue 1 year ago • 8 comments

Hi, I try to use bakta desktop, which is working fine to some extent (I get gff and all the prediction part works fine), however i got the following error "ValueError: Need a Nucleotide or Protein alphabet" which is connected wit Biopyton, as far as I could find out. The issue is described in hear https://biopython.org/wiki/Alphabet. I hopped that I could introduce same changes to the InsdcIO.py file to overcome the problem, but it is too hard for me. Could you help me to solve the problem and make the desktop version fully functional?

kitka2000 avatar Jul 27 '22 21:07 kitka2000

Hi, thanks for reaching out with this. That's interesting! Could you maybe provide the log file of a verbose run (bakta --verbose ...) and maybe also a debugging file triggering a reproducible error? And which version of Bakta (bakta --version) do you use?

oschwengers avatar Aug 02 '22 10:08 oschwengers

Hi,

Thanks for answering.

Please see the attached files - I copied the error message from the terminal (bakta_error.txt), but I cannot attach the full log file (it is too big), so I removed the data analysis part (log_file_start&end_part.txt). I use bakta 1.4.2, which I installed with conda. The error message appears each time I run bakta. I hope it is enough for you.

All the best, Magda

wt., 2 sie 2022 o 12:36 Oliver Schwengers @.***> napisał(a):

Hi, thanks for reaching out with this. That's interesting! Could you maybe provide the log file of a verbose run (bakta --verbose ...) and maybe also a debugging file triggering a reproducible error? And which version of Bakta (bakta --version) do you use?

— Reply to this email directly, view it on GitHub https://github.com/oschwengers/bakta/issues/116#issuecomment-1202312104, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIUGBC6A5O2HRL5VUUCVOFDVXD22VANCNFSM543ECT5Q . You are receiving this because you authored the thread.Message ID: @.***>

-- Pozdrawiam,

M. Guzowska


dr Magdalena Guzowska Katedra Nauk Fizjologicznych, WMW, SGGW ul. Nowoursynowska 159, bud 24, pok. 143 02-776 Warszawa

export annotation results to: [/home/...] human readable TSV... GFF3... INSDC GenBank & EMBL... Traceback (most recent call last): File "~/miniconda3/bin/bakta", line 10, in sys.exit(main()) File "~/miniconda3/lib/python3.9/site-packages/bakta/main.py", line 499, in main insdc.write_insdc(genome, features, genbank_path, embl_path) File "~/miniconda3/lib/python3.9/site-packages/bakta/io/insdc.py", line 260, in write_insdc SeqIO.write(contig_list, fh, format='genbank') File "~/.local/lib/python3.9/site-packages/Bio/SeqIO/init.py", line 533, in write count = writer_class(fp).write_file(sequences) File "~/.local/lib/python3.9/site-packages/Bio/SeqIO/Interfaces.py", line 237, in write_file count = self.write_records(records) File "~/.local/lib/python3.9/site-packages/Bio/SeqIO/Interfaces.py", line 222, in write_records self.write_record(record) File "~/.local/lib/python3.9/site-packages/Bio/SeqIO/InsdcIO.py", line 830, in write_record self._write_the_first_line(record) File "~/.local/lib/python3.9/site-packages/Bio/SeqIO/InsdcIO.py", line 631, in _write_the_first_line raise ValueError("Need a Nucleotide or Protein alphabet") ValueError: Need a Nucleotide or Protein alphabet

16:21:38.690 - INFO - UTILS - version=1.4.2 16:21:38.690 - INFO - UTILS - developer: Oliver Schwengers, github.com/oschwengers 16:21:38.690 - INFO - UTILS - command: bakta --db ./db --prefix Salmonella_12S -v --output ../bakta --genus Salmonella --compliant 12S_SEE_cns.fasta 16:21:38.690 - INFO - UTILS - local time: 2022-08-04 16:21:38 16:21:38.695 - INFO - UTILS - machine: type=x86_64, cores=20 16:21:38.696 - INFO - UTILS - system: type=Linux, release=5.14.0-1046-oem 16:21:38.696 - INFO - UTILS - python: version=3.9.13, implementation=CPython 16:21:38.696 - INFO - CONFIG - threads=20 16:21:38.696 - INFO - CONFIG - verbose=True 16:21:38.696 - DEBUG - CONFIG - test parameter db: db_tmp=./db 16:21:38.696 - INFO - CONFIG - database: type=parameter, path=./db 16:21:38.697 - INFO - CONFIG - tmp-path=/tmp/tmp9gsvak_1 16:21:38.697 - INFO - CONFIG - genome-path= 16:21:38.697 - INFO - CONFIG - min_contig_length=1 16:21:38.697 - INFO - CONFIG - prefix=Salmonella_12S 16:21:38.697 - INFO - CONFIG - output-path= 16:21:38.697 - INFO - CONFIG - genus=Salmonella 16:21:38.697 - INFO - CONFIG - species=None 16:21:38.697 - INFO - CONFIG - strain=None 16:21:38.697 - INFO - CONFIG - plasmid=None 16:21:38.697 - INFO - CONFIG - complete=False 16:21:38.697 - INFO - CONFIG - prodigal_tf=None 16:21:38.697 - INFO - CONFIG - translation_table=11 16:21:38.697 - INFO - CONFIG - gram=? 16:21:38.697 - INFO - CONFIG - compliant=True 16:21:38.697 - INFO - CONFIG - compliant mode! min_contig_length=200 16:21:38.697 - INFO - CONFIG - locus=None 16:21:38.697 - INFO - CONFIG - locus-tag=None 16:21:38.697 - INFO - CONFIG - keep_contig_headers=False 16:21:38.697 - INFO - CONFIG - replicon-table=None 16:21:38.697 - INFO - CONFIG - skip-tRNA=False 16:21:38.697 - INFO - CONFIG - skip-tmRNA=False 16:21:38.697 - INFO - CONFIG - skip-rRNA=False 16:21:38.697 - INFO - CONFIG - skip-ncRNA=False 16:21:38.697 - INFO - CONFIG - skip-ncRNA-region=False 16:21:38.697 - INFO - CONFIG - skip-CRISPR=False 16:21:38.697 - INFO - CONFIG - skip-CDS=False 16:21:38.697 - INFO - CONFIG - skip-sORF=False 16:21:38.698 - INFO - CONFIG - skip-gap=False 16:21:38.698 - INFO - CONFIG - skip-ori=False 16:21:38.698 - INFO - DB - detected: major=3, minor=1, date=2022-02-03 16:21:38.861 - INFO - UTILS - dependency: tool=tRNAscan-SE, version=v2.0.9 16:21:38.869 - INFO - UTILS - dependency: tool=aragorn, version=v1.2.41 16:21:38.885 - INFO - UTILS - dependency: tool=cmscan, version=v1.1.4 16:21:38.895 - INFO - UTILS - dependency: tool=pilercr, version=v1.6.0 16:21:38.917 - INFO - UTILS - dependency: tool=prodigal, version=v2.6.3 16:21:38.928 - INFO - UTILS - dependency: tool=amrfinder, version=v3.10.30 16:21:38.970 - INFO - UTILS - dependency: tool=hmmsearch, version=v3.3.2 16:21:38.994 - INFO - UTILS - dependency: tool=diamond, version=v2.0.15 16:21:39.095 - INFO - UTILS - dependency: tool=blastn, version=v2.13.0 16:21:39.239 - INFO - FASTA - imported: id=12S_SEE_cns, length=4857492, description=, genomic=True, dna=True 16:21:39.241 - INFO - FASTA - imported: id=12S_SEE_cns, length=91136, description=, genomic=True, dna=True 16:21:39.241 - INFO - MAIN - imported sequences=2 16:21:39.241 - INFO - UTILS - qc: revised sequence: id=contig_1, orig-id=12S_SEE_cns, type=contig, complete=False, topology=linear, name=, description='[organism=Salmonella] [gcode=11]', orig-description='' 16:21:39.241 - INFO - UTILS - qc: revised sequence: id=contig_2, orig-id=12S_SEE_cns, type=contig, complete=False, topology=linear, name=, description='[organism=Salmonella] [gcode=11]', orig-description='' 16:21:39.241 - INFO - FASTA - write genome sequences: path=/tmp/tmp9gsvak_1/contigs.fna, description=False, wrap=False 16:21:39.247 - DEBUG - MAIN - start tRNA prediction .....

16:29:03.823 - DEBUG - ANNOTATION - filter features on contig: contig_2 16:29:03.823 - DEBUG - MAIN - start feature selection and creation of locus tags 16:29:03.834 - INFO - UTILS - generated sequence tag prefix: prefix=FOPLCOMPJC, length=10, MD5=F895C8693C5B657EAB3DE2A67E7D6D48 16:29:03.839 - INFO - MAIN - selected features=4920 16:29:03.849 - INFO - UTILS - generated sequence tag prefix: prefix=FOPLCO, length=6, MD5=F895C8693C5B657EAB3DE2A67E7D6D48 16:29:03.849 - INFO - MAIN - locus tag prefix=FOPLCO 16:29:03.854 - INFO - UTILS - genome-size=4948628 16:29:03.892 - INFO - UTILS - GC=0.526 16:29:03.892 - INFO - UTILS - N=0.102 16:29:03.892 - INFO - UTILS - N50=4857492 16:29:03.894 - INFO - UTILS - coding-ratio=1.109 16:29:03.901 - INFO - TSV - write tsv: path=~/Salmonella_12S.tsv 16:29:03.921 - INFO - GFF - write GFF3: path=~/Salmonella_12S.gff3 16:29:04.017 - DEBUG - INSDC - prepare: genbank=~/Salmonella_12S.gbff, embl=[/home/...]/Salmonella_12S.embl 16:29:04.149 - INFO - INSDC - write GenBank: path=~/Salmonella_12S.gbff 16:29:04.158 - INFO - MAIN - removed tmp dir: /tmp/tmp9gsvak_1

kitka2000 avatar Aug 04 '22 20:08 kitka2000

Thanks! This is indeed due to BioPython and I guess it relates to a recent deprecation of the Bio.Alphabet module within BioPython: https://github.com/biopython/biopython/issues/3156

Actually, conda should take care that the installed BioPython version is >=1.78. But could you please double check it to be sure? (of course within your activated conda environment)

$ python3
import Bio
print(Bio.__version__)

oschwengers avatar Aug 05 '22 07:08 oschwengers

Indeed, the causing line 631 of Bio/SeqIO/InsdcIO.py occurs in BioPython 1.74: https://github.com/biopython/biopython/blob/8677e94d3d4ca7374e12d66a05e0dd3732168cce/Bio/SeqIO/InsdcIO.py#L631

Could therefore please update BioPython to 1.78 or 1.79 and try again?

oschwengers avatar Aug 05 '22 08:08 oschwengers

@kitka2000 , gentle ping Did this solve the issue?

oschwengers avatar Aug 15 '22 08:08 oschwengers

Hi, I sent you the message below a couple of days ago but it seems that you didn’t get it. I think I should reinstall the bakta in the separate environment. Do you think it could help?

W dniu wt., 9.08.2022 o 10:30 Magdalena Guzowska @.***> napisał(a):

Hi, sorry for the late response.

Somehow I have a real mess in my installations. When I check for the Biopython versions it states that I have 1.73. So I tried to update with pip all the packages which are outdated, and I got plenty of errors:

$ pip list --outdated Package Version Latest Type


alive-progress 1.6.2 2.4.1 wheel biopython 1.73 1.79 wheel ncbi-genome-download 0.2.8 0.3.1 wheel numpy 1.22.4 1.23.1 wheel

$ pip3 list --outdated --format=freeze | grep -v '^-e' | cut -d = -f 1 | xargs -n1 pip3 install -U

Error list: ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. bakta 1.4.2 requires alive-progress==1.6.2, but you have alive-progress 2.4.1 which is incompatible. bakta 1.4.2 requires biopython>=1.78, but you have biopython 1.73 which is incompatible.

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. seqsero2 1.2.1 requires biopython==1.73, but you have biopython 1.79 which is incompatible. bakta 1.4.2 requires alive-progress==1.6.2, but you have alive-progress 2.4.1 which is incompatible.

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. bakta 1.4.2 requires alive-progress==1.6.2, but you have alive-progress 2.4.1 which is incompatible.

All the best, Magda

W dniu wt., 9.08.2022 o 10:30 Magdalena Guzowska @.***> napisał(a):

Hi, sorry for the late response.

Somehow I have a real mess in my installations. When I check for the Biopython versions it states that I have 1.73. So I tried to update with pip all the packages which are outdated, and I got plenty of errors:

$ pip list --outdated Package Version Latest Type


alive-progress 1.6.2 2.4.1 wheel biopython 1.73 1.79 wheel ncbi-genome-download 0.2.8 0.3.1 wheel numpy 1.22.4 1.23.1 wheel

$ pip3 list --outdated --format=freeze | grep -v '^-e' | cut -d = -f 1 | xargs -n1 pip3 install -U

Error list: ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. bakta 1.4.2 requires alive-progress==1.6.2, but you have alive-progress 2.4.1 which is incompatible. bakta 1.4.2 requires biopython>=1.78, but you have biopython 1.73 which is incompatible.

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. seqsero2 1.2.1 requires biopython==1.73, but you have biopython 1.79 which is incompatible. bakta 1.4.2 requires alive-progress==1.6.2, but you have alive-progress 2.4.1 which is incompatible.

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. bakta 1.4.2 requires alive-progress==1.6.2, but you have alive-progress 2.4.1 which is incompatible.

Should I reinstall bakta in a different environment to avoid these errors?

All the best, Magda

pt., 5 sie 2022 o 10:00 Oliver Schwengers @.***> napisał(a):

Indeed, the causing line 631 of Bio/SeqIO/InsdcIO.py occurs in BioPython 1.74:

https://github.com/biopython/biopython/blob/8677e94d3d4ca7374e12d66a05e0dd3732168cce/Bio/SeqIO/InsdcIO.py#L631

Could therefore please update BioPython to 1.78 or 1.79 and try again?

— Reply to this email directly, view it on GitHub https://github.com/oschwengers/bakta/issues/116#issuecomment-1206161275, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIUGBC3GOW43HYPOEKHOTJLVXTC3FANCNFSM543ECT5Q . You are receiving this because you authored the thread.Message ID: @.***>

-- Pozdrawiam,

M. Guzowska


dr Magdalena Guzowska Katedra Nauk Fizjologicznych, WMW, SGGW ul. Nowoursynowska 159, bud 24, pok. 143 02-776 Warszawa

-- Pozdrawiam,

M. Guzowska


dr Magdalena Guzowska Katedra Nauk Fizjologicznych, WMW, SGGW ul. Nowoursynowska 159, bud 24, pok. 143 02-776 Warszawa

kitka2000 avatar Aug 15 '22 08:08 kitka2000

Oh, indeed I've somehow overseen your msg, my fault - sorry! These version conflicts often happen within overcrowded environments where several tools require different versions of the same 3rd party dependency. In this case a fresh Conda environment is the easiest way to go.

So, yes please just try to install Bakta in a fresh environment. This should help to solve this issue. Best regards!

oschwengers avatar Aug 15 '22 08:08 oschwengers

I will do it:) thank you once again:)

All the best, Magda:)

W dniu pon., 15.08.2022 o 10:50 Oliver Schwengers @.***> napisał(a):

Oh, indeed I've somehow overseen your msg, my fault - sorry! These version conflicts often happen within overcrowded environments where several tools require different versions of the same 3rd party dependency. In this case a fresh Conda environment is the easiest way to go.

So, yes please just try to install Bakta in a fresh environment. This should help to solve this issue. Best regards!

— Reply to this email directly, view it on GitHub https://github.com/oschwengers/bakta/issues/116#issuecomment-1214766587, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIUGBCZIBC7CCIYASHRP6U3VZIAG5ANCNFSM543ECT5Q . You are receiving this because you were mentioned.Message ID: @.***>

-- Pozdrawiam,

M. Guzowska


dr Magdalena Guzowska Katedra Nauk Fizjologicznych, WMW, SGGW ul. Nowoursynowska 159, bud 24, pok. 143 02-776 Warszawa

kitka2000 avatar Aug 15 '22 08:08 kitka2000

Hi @kitka2000 , did a fresh Conda environment help to solve this issue?

oschwengers avatar Aug 25 '22 06:08 oschwengers

Hi,

Yes it works perfectly well:) Thank you once more for helping me:) I think I’m a bit more conscious conda user right now, thanks to you:) All the best, Magda

W dniu czw., 25.08.2022 o 08:47 Oliver Schwengers @.***> napisał(a):

Hi @kitka2000 https://github.com/kitka2000 , did a fresh Conda environment help to solve this issue?

— Reply to this email directly, view it on GitHub https://github.com/oschwengers/bakta/issues/116#issuecomment-1226846427, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIUGBCZN37OUKDKUWVLXLDDV24JI3ANCNFSM543ECT5Q . You are receiving this because you were mentioned.Message ID: @.***>

-- Pozdrawiam,

M. Guzowska


dr Magdalena Guzowska Katedra Nauk Fizjologicznych, WMW, SGGW ul. Nowoursynowska 159, bud 24, pok. 143 02-776 Warszawa

kitka2000 avatar Aug 25 '22 07:08 kitka2000

You're very welcome - glad to hear! All the best, Oliver

oschwengers avatar Aug 25 '22 07:08 oschwengers

Hi, sorry for the late response.

Somehow I have a real mess in my installations. When I check for the Biopython versions it states that I have 1.73. So I tried to update with pip all the packages which are outdated, and I got plenty of errors:

$ pip list --outdated Package Version Latest Type


alive-progress 1.6.2 2.4.1 wheel biopython 1.73 1.79 wheel ncbi-genome-download 0.2.8 0.3.1 wheel numpy 1.22.4 1.23.1 wheel

$ pip3 list --outdated --format=freeze | grep -v '^-e' | cut -d = -f 1 | xargs -n1 pip3 install -U

Error list: ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. bakta 1.4.2 requires alive-progress==1.6.2, but you have alive-progress 2.4.1 which is incompatible. bakta 1.4.2 requires biopython>=1.78, but you have biopython 1.73 which is incompatible.

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. seqsero2 1.2.1 requires biopython==1.73, but you have biopython 1.79 which is incompatible. bakta 1.4.2 requires alive-progress==1.6.2, but you have alive-progress 2.4.1 which is incompatible.

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. bakta 1.4.2 requires alive-progress==1.6.2, but you have alive-progress 2.4.1 which is incompatible.

Should I reinstall bakta in a different environment to avoid these errors?

All the best, Magda

pt., 5 sie 2022 o 10:00 Oliver Schwengers @.***> napisał(a):

Indeed, the causing line 631 of Bio/SeqIO/InsdcIO.py occurs in BioPython 1.74:

https://github.com/biopython/biopython/blob/8677e94d3d4ca7374e12d66a05e0dd3732168cce/Bio/SeqIO/InsdcIO.py#L631

Could therefore please update BioPython to 1.78 or 1.79 and try again?

— Reply to this email directly, view it on GitHub https://github.com/oschwengers/bakta/issues/116#issuecomment-1206161275, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIUGBC3GOW43HYPOEKHOTJLVXTC3FANCNFSM543ECT5Q . You are receiving this because you authored the thread.Message ID: @.***>

-- Pozdrawiam,

M. Guzowska


dr Magdalena Guzowska Katedra Nauk Fizjologicznych, WMW, SGGW ul. Nowoursynowska 159, bud 24, pok. 143 02-776 Warszawa

kitka2000 avatar Oct 11 '22 08:10 kitka2000

Hi Magda, since Bakta requires 1.6.2 of alive-progress, you cannot update it to 2.4.1 w/o breaking Bakta's dependencies. Therefore, you should either keep 1.6.2 or install Bakta in a new environment.

BTW, it's a bit tricky to update the Python dependencies within a Conda env by using Pip. Technically, this is absolutely OK - but by doing so you bypass Conda as the initial package manager. I'd recommend to either install Bakta in a new env or to update via Conda/Pip all packages that can be updated w/o breaking any dependencies.

oschwengers avatar Oct 11 '22 10:10 oschwengers

Hi Oliver,

Thanks for the tip!

Best, Magda

W dniu wt., 11.10.2022 o 12:35 Oliver Schwengers @.***> napisał(a):

Hi Magda, since Bakta requires 1.6.2 of alive-progress, you cannot update it to 2.4.1 w/o breaking Bakta's dependencies. Therefore, you should either keep 1.6.2 or install Bakta in a new environment.

BTW, it's a bit tricky to update the Python dependencies within a Conda env by using Pip. Technically, this is absolutely OK - but by doing so you bypass Conda as the initial package manager. I'd recommend to either install Bakta in a new env or to update via Conda/Pip all packages that can be updated w/o breaking any dependencies.

— Reply to this email directly, view it on GitHub https://github.com/oschwengers/bakta/issues/116#issuecomment-1274480133, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIUGBC2IPMN52MGJRSKWFVDWCU7GPANCNFSM543ECT5Q . You are receiving this because you were mentioned.Message ID: @.***>

-- Pozdrawiam,

M. Guzowska


dr Magdalena Guzowska Katedra Nauk Fizjologicznych, WMW, SGGW ul. Nowoursynowska 159, bud 24, pok. 143 02-776 Warszawa

kitka2000 avatar Oct 11 '22 10:10 kitka2000