treeclimbR icon indicating copy to clipboard operation
treeclimbR copied to clipboard

Feature proposal: NAs

Open andrewjmc opened this issue 2 years ago • 3 comments

Hello,

Thanks again for such a useful tool. It's my understanding from the code that any node with NAs for 50% or more descendants cannot be considered in the candidate generation. It might be useful for some users to be able to override this.

In my case I have species counts from the GTDB taxonomy, and because of a highly sensitive search tool (kraken) a very large proportion of species are identified at least once across large dataset. I originally simply aggregated all species which never exceeded 0.1% RA as a "rare species" OTU. However, I realised this prevented these features RAs from being propagated up the hierarchy.

I have thus kept them in, but any feature (at any level) which never exceeds 0.1% RA, or has <10% prevalence, does not get a p-value generated (I don't want to generate p values for features which are so highly unlikely to be informative). It would have been useful to be able to force treeclimbR to ignore the NAs, as if those nodes did not exist.

I have made a workable solution, since I needed to remove the un-analysed leaves and nodes for graphical presentation. I have thus painfully coded the removal of all these leaves and nodes, and the problem is solved for me. But the feature could be useful for a further release.

Best wishes,

Andrew

andrewjmc avatar Sep 21 '21 14:09 andrewjmc