taxize icon indicating copy to clipboard operation
taxize copied to clipboard

`downsteam` misses descendents of ambiguous clades

Open arendsee opened this issue 6 years ago • 9 comments

This call

taxize::downstream('Brassicaceae', db='ncbi', downto='genus')

loses all descendents of the clade 'Brassicaceae incertae sedis'. I believe this clade gets filtered out as ambiguous when children is called on 'Brassicaceae'.

In this particular case, I believe the species in 'Brassicaceae incertae sedis' are predicted to be in 'Brassicaceae', but their particular location in the family is unknown. So I think they should be included as downstream of 'Brassicaceae'.

We can pass the ambiguous=TRUE argument to ncbi_downstream. This in turn should, according to the documentation, be passed to ncbi_children.

taxize::downstream('Brassicaceae', db='ncbi', downto='genus', ambiguous=TRUE)

However the results are the same, so I think the argument is not getting passed.

Also, it is reasonable that a user might want to keep ambiguous nodes but filter ambiguous species. I would suggest adding two new arguments to ncbi_downstream: ambiguous_nodes=TRUE and ambiguous_species=FALSE. I am not certain, though, if keeping ambiguous nodes is the right thing to do by default.

arendsee avatar Dec 14 '17 21:12 arendsee

I think I fixed the argument passing issue. But there is still the question of whether there should be special handling of ambiguous nodes.

arendsee avatar Dec 14 '17 22:12 arendsee

thanks, will take a look tomorrow to familiarize myself with it, can't make educ. opinion right now (boarding 🛫 soon)

sckott avatar Dec 14 '17 22:12 sckott

No worries, have a nice flight!

arendsee avatar Dec 14 '17 22:12 arendsee

the merge closed this, but you did say it partially addresses this, i assume you want this to remain open, yes?

sckott avatar Dec 15 '17 23:12 sckott

Yeah, there is just the matter of whether we want to be able to handle ambiguous nodes and ambiguous species differently.

arendsee avatar Dec 15 '17 23:12 arendsee

yeah, will reopen

sckott avatar Dec 15 '17 23:12 sckott

@arendsee any further thoughts on

whether we want to be able to handle ambiguous nodes and ambiguous species differently.

sckott avatar Feb 02 '18 22:02 sckott

@sckott Nothing particularly new. taxizedb can distinguish ambiguous nodes and ambiguous species, e.g.

taxizedb::downstream(3700, downto='genus', ambiguous_nodes=FALSE, ambiguous_species=TRUE)

Adding the same arguments to taxize might be good. Although I am not sure whether the break from the existing API is worth the gain in control.

arendsee avatar Feb 03 '18 20:02 arendsee

bumping this to next milestone

sckott avatar Mar 20 '18 01:03 sckott