GTDBTk icon indicating copy to clipboard operation
GTDBTk copied to clipboard

de_novo_workflow fails to parse gtdb classify output

Open adityabandla opened this issue 2 years ago • 1 comments

Hello,

I am trying to build a tree integrating my genomes with the GTDB refs. After the tree is built, the workflow fails with the error

ERROR: Not a tab-separated line: user_genome	classification	fastani_reference	fastani_reference_radius	fastani_taxonomy	fastani_ani	fastani_af	closest_placement_reference	closest_placement_radius	closest_placement_taxonomy	closest_placement_ani	closest_placement_af	pplacer_taxonomy	classification_method	note	other_related_references(genome_id,species_name,radius,ANI,AF)	msa_percent	translation_table	red_value	warnings

I had provided the file produced by gtdb classify to the flag --gtdbtk_classification_file.

adityabandla avatar Sep 01 '22 07:09 adityabandla

Hello, thanks for submitting this issue. This will be fixed in the next release.

As a workaround, you can manually specify the taxonomy using the --custom_taxonomy_file instead.

aaronmussig avatar Sep 30 '22 01:09 aaronmussig

Hi @aaronmussig

As a workaround, you can manually specify the taxonomy using the --custom_taxonomy_file instead.

Do you mean by this: cut columns 1 and 2 (user_genome and classification) from the classify_wf output file, and pass those via --custom_taxonomy_file?

More generally, do these options have the same purpose, i.e. to decorate the tree?

zwets avatar Dec 06 '22 21:12 zwets

Hi @zwets I am facing a similar issue. Could you solve by cutting the first two columns and providing as --custom_taxonomy_file?

arijitnus avatar Feb 03 '23 03:02 arijitnus

@adityabandla

I am facing the same issue. Could you let me know if you could solve this problem.

arijitnus avatar Feb 03 '23 03:02 arijitnus