datasets icon indicating copy to clipboard operation
datasets copied to clipboard

Not all parameters of the taxonomy API work

Open alvanuffelen opened this issue 7 months ago • 3 comments

https://api.ncbi.nlm.nih.gov/datasets/v2alpha/taxonomy/taxon/9606?returned_content=COMPLETE and https://api.ncbi.nlm.nih.gov/datasets/v2alpha/taxonomy/taxon/9606?returned_content=TAXIDS

both return the same output. Should there be no difference between the two? The parameters under filtered_subtree also seem to have little effect.

alvanuffelen avatar Dec 06 '23 09:12 alvanuffelen

Hi @alvanuffelen

You are correct there should be a difference between them, this issue has been replicated and a fix should be pushed out in the next few days.

Would you be able to expand a bit on the filtered_subtree issue that you've run into?

John

syntheticgio avatar Dec 06 '23 17:12 syntheticgio

Hi @syntheticgio

Thanks for the quick fix!

Regarding the filtered_subtree, I find not all parameters and output clear:

  1. The schema states that the property root_nodes: [integer] should be returned. However, it's missing from my output https://api.ncbi.nlm.nih.gov/datasets/v2alpha/taxonomy/taxon/9606/filtered_subtree

  2. The value of children_status is not consistent: https://api.ncbi.nlm.nih.gov/datasets/v2alpha/taxonomy/taxon/9606/filtered_subtree --> 9606 has no children_status https://api.ncbi.nlm.nih.gov/datasets/v2alpha/taxonomy/taxon/9606,207598/filtered_subtree --> 9606 has HAS_MORE_CHILDREN as children_status, even though the children are listed. I assume the children are listed because they are immediate children of 9606 but also outputs HAS_MORE_CHILDREN because they are not immediate children of 207598. Is this intended behavior?

  3. The summary of filtered_subtree states:

    [...] get a filtered taxonomic subtree that includes the full parent lineage [...]

    However, no parents are included in the return? https://api.ncbi.nlm.nih.gov/datasets/v2alpha/taxonomy/taxon/9606/filtered_subtree

  4. Parameter rank_limits limits to the provided rank. However, in my case it just adds the specified rank of the taxon to the output (under edge 1 in my example): https://api.ncbi.nlm.nih.gov/datasets/v2alpha/taxonomy/taxon/9606/filtered_subtree?rank_limits=KINGDOM

  5. Parameter specified_limit doesn't seem to do anything: https://api.ncbi.nlm.nih.gov/datasets/v2alpha/taxonomy/taxon/9606/filtered_subtree?rank_limits=KINGDOM&specified_limit=true https://api.ncbi.nlm.nih.gov/datasets/v2alpha/taxonomy/taxon/9606/filtered_subtree?rank_limits=KINGDOM&specified_limit=false https://api.ncbi.nlm.nih.gov/datasets/v2alpha/taxonomy/taxon/9606/filtered_subtree?rank_limits=KINGDOM All three links give the same output. What should specified_limit do?

I'm uncertain whether all my points represent genuine issues or if I haven't grasped the concept of the filtered_subtree completely.

alvanuffelen avatar Dec 07 '23 08:12 alvanuffelen

Thanks for the detailed information @alvanuffelen ; I will make up a ticket for this to be investigated internally. I believe these are legitimate issues. I also wasn't able to make specified_limit work as expected and the output I'm seeing with rank_limits isn't wasn't what I would have expected - so at the very least the documentation may have to be more clear on what should be returned.

To try to directly answer your question specified_limit should limit the results only to the requested taxons - but I wasn't around during the actual implementation so I'll try to figure out if that is a poor interpretation. In any event, it seems like it is not properly implemented from what you described (and I also saw).

I'll leave the issue open until this second issue is resolved

syntheticgio avatar Dec 07 '23 17:12 syntheticgio