EukDetect icon indicating copy to clipboard operation
EukDetect copied to clipboard

invalid escape sequence '\d' on get_uncomputed_taxid_per_busco

Open ailtonpcf opened this issue 7 months ago • 3 comments

Dear Dr. Lind,

I'm generating a custom eukdetect db and I'm stucked at get_uncomputed_taxid_per_busco.py. It fails with the following message:

""" python /home/qi47rin/proj/00-git/EukDetect/build_db/get_uncomputed_taxid_per_busco.py --speciestax cache/45-create-eukdetect-db/genomes-table/species_taxid.tsv --fasta cache/45-create-eukdetect-db/genes-repeat-filtered/buscos_cdhit99_less10perc_repeats_masked.fna --collapsed_ids cache/45-create-eukdetect-db/busco-cdhit99-renamed/buscos_cdhit99_renamed_busco_seqid_sequential_correspondence.txt --taxdb cache/45-create-eukdetect-db/taxdump/taxa.sqlite > cache/45-create-eukdetect-db/busco-taxid/busco_taxid_link.txt

Activating conda environment: cache/00-conda-env/bdf327b44096dcc3f601392a860ec146_ /home/qi47rin/proj/00-git/EukDetect/build_db/get_uncomputed_taxid_per_busco.py:27: SyntaxWarning: invalid escape sequence '\d' sp = re.split('-\dat\d-', '-'.join(seq.id.split('-')[1:]))[0] /home/qi47rin/proj/00-git/EukDetect/build_db/get_uncomputed_taxid_per_busco.py:46: SyntaxWarning: invalid escape sequence '\d' new = re.split('-\dat\d-', '-'.join(sp.split('-')[1:]))[0] Traceback (most recent call last): File "/home/qi47rin/proj/00-git/EukDetect/build_db/get_uncomputed_taxid_per_busco.py", line 79, in main(sys.argv) File "/home/qi47rin/proj/00-git/EukDetect/build_db/get_uncomputed_taxid_per_busco.py", line 67, in main tree = ncbi.get_topology(taxids) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/work/qi47rin/proj/02-compost-microbes/cache/00-conda-env/bdf327b44096dcc3f601392a860ec146_/lib/python3.12/site-packages/ete3/ncbi_taxonomy/ncbiquery.py", line 463, in get_topology root = elem2node[1] ~~~~~~~~~^^^ KeyError: 1 """

It follows attached the files I have generated, but taxdump given its size. Do you know what might be happening?

Another question, in the helper section withing the script, when you say "Tab delimited file of species name (as encoded in busco header) and taxonomy ID")", you mean the headers in the fasta file?

Best regards, Ailton. euk-db-asp3.zip

ailtonpcf avatar Jul 01 '24 11:07 ailtonpcf