taxonkit icon indicating copy to clipboard operation
taxonkit copied to clipboard

Error in list method

Open Maryam-Haghani opened this issue 1 year ago • 8 comments

Hi,

I am using list command to get the sub-trees for 3 taxa of [11050, 11103, 11108], the result is different for different orders of these taxa. i.e.: When I call taxonkit list --ids ... for tax ids that are in the same path of the tree (in the same lineage) in the order of top to bottom, it does not return the result for the descendant, but changing the order from bottom to up, the result is different and it somehow returns the correct result.

As an example taxonkit list --ids 11050,11103,11108,--show-name --indent "" just returns the result for 11050 which is an ancestor for the others, but taxonkit list --ids 11108,11103,11050 --show-name --indent "" gives the sub-tree for 11050 and 11103, but not 11108.

could you please help me with this problem?

Maryam-Haghani avatar Oct 14 '22 00:10 Maryam-Haghani

Thanks for reporting this.

It was designed to avoid repeatedly outputting (already outputted) subtrees.

  1. Marking taxa as visited: https://github.com/shenwei356/taxonkit/blob/master/taxonkit/cmd/list.go#L294
  2. Skip marked taxon nodes: https://github.com/shenwei356/taxonkit/blob/master/taxonkit/cmd/list.go#L244-L246

I can't remember why I did this.

Use the patch for now:

shenwei356 avatar Oct 14 '22 01:10 shenwei356

OK, I think it was a bug. Thanks for reporting this.

By the way, taxid:11108 has no children.

shenwei356 avatar Oct 14 '22 01:10 shenwei356

OK, I think it was a bug. Thanks for reporting this.

By the way, taxid:11108 has no children.

Yeah 11108 has no children, but considering the result formatting for the list method, it should return just itself as the result.

Maryam-Haghani avatar Oct 14 '22 02:10 Maryam-Haghani

Thanks for reporting this.

It was designed to avoid repeatedly outputting (already outputted) subtrees.

1. Marking taxa as visited: https://github.com/shenwei356/taxonkit/blob/master/taxonkit/cmd/list.go#L294

2. Skip marked taxon nodes: https://github.com/shenwei356/taxonkit/blob/master/taxonkit/cmd/list.go#L244-L246

I can't remember why I did this.

Use the patch for now:

* [taxonkit_linux_amd64.tar.gz](https://github.com/shenwei356/taxonkit/files/9781696/taxonkit_linux_amd64.tar.gz)

Thank you for the quick response. So what should I do if I want to have the sub-trees independently? Actually, I am using the python library, pytaxonkit, and because of this bug in taxon, tree.traverse raises an error for these taxa. The code is: result = pytaxonkit.list([11050, 11103, 11108]) for taxon, tree in result: subtaxa = [t for t in tree.traverse] print(f'Top level result: {taxon.name} ({taxon.taxid}); {len(subtaxa)} related taxa')

Maryam-Haghani avatar Oct 14 '22 02:10 Maryam-Haghani

Yeah 11108 has no children, but considering the result formatting for the list method, it should return just itself as the result.

Yes, it did.

$ taxonkit list --ids 11108 -n -r 
11108 [no rank] Hepatitis C virus (isolate H)

So what should I do if I want to have the sub-trees independently?

Call the method separately for each taxid.

shenwei356 avatar Oct 14 '22 02:10 shenwei356

Yeah 11108 has no children, but considering the result formatting for the list method, it should return just itself as the result.

Yes, it did.

$ taxonkit list --ids 11108 -n -r 
11108 [no rank] Hepatitis C virus (isolate H)

Sure it does, but not in taxonkit list --ids 11108,11103,11050.


> So what should I do if I want to have the sub-trees independently?

Call the method separately for each taxid.

Seems it is the only way. thank you.

Maryam-Haghani avatar Oct 14 '22 03:10 Maryam-Haghani

Yeah 11108 has no children, but considering the result formatting for the list method, it should return just itself as the result.

I've fixed it, did you try the new binary?

$ taxonkit list --ids 11108,11103,11050 -r -n | head -n 5
11108 [no rank] Hepatitis C virus (isolate H)

11103 [species] Hepacivirus C
  33745 [genotype] Hepatitis C virus genotype 4
    31653 [no rank] Hepatitis C virus subtype 4a

You can replace the old binary (which taxonkit) with the new one, to see if the method in Pytaxonkit works as expected.

shenwei356 avatar Oct 14 '22 03:10 shenwei356

Yeah 11108 has no children, but considering the result formatting for the list method, it should return just itself as the result.

I've fixed it, did you try the new binary?

$ taxonkit list --ids 11108,11103,11050 -r -n | head -n 5
11108 [no rank] Hepatitis C virus (isolate H)

11103 [species] Hepacivirus C
  33745 [genotype] Hepatitis C virus genotype 4
    31653 [no rank] Hepatitis C virus subtype 4a

You can replace the old binary (which taxonkit) with the new one, to see if the method in Pytaxonkit works as expected.

Yes thank you. But, still it does not show the already visited nodes and that is why I should call the list method individually for each taxid to have the sub-trees independently. Am I right? I hope it had a flag to indicate ignoring already visited nodes or not, in order to call the method once for all the ids.

Maryam-Haghani avatar Oct 14 '22 18:10 Maryam-Haghani

After checking PyTaxonKit code, I think it does not perform any filtering or skipping. @standage, am I right?

Replacing the old binary (run which taxonkit to see where it is) with the new one should fix the bug.

shenwei356 avatar Oct 15 '22 04:10 shenwei356

After checking PyTaxonKit code, I think it does not perform any filtering or skipping. standage, am I right?

Thanks for your patience with my delayed response, @shenwei356. You are correct, PyTaxonKit doesn't perform any filtering on taxonkit list results. Let me know if you find any bugs with the Python bindings.

standage avatar Oct 17 '22 14:10 standage