woltka icon indicating copy to clipboard operation
woltka copied to clipboard

Which parameters does the gotu command actually uses?

Open antgonza opened this issue 5 years ago • 2 comments

Going over the code is super confusing to know exactly which parameters or values the gotu command is using. It looks like gotu calls the classify command without any parameters (everything is None) and then this calls workflow and then there is a few calls to other functions but with everything None is not clear.

Another way to think about this question is: what's the difference between classify and gotu?

antgonza avatar Feb 23 '20 17:02 antgonza

@antgonza Thanks for the insightful comments! gotu is a minimal subset of classify, i.e., no classification; just assign queries to subjects but not to higher classification units. So it does not need most of the parameters of the classify command. In this program, gotu and classify shares the same workflow to ensure comparability between results.

The functions being called will return None when parameters are None. For example, two main settings differentiates a gotu workflow from a taxonomic classification workflow: whether there is a classification system (loaded by --nodes or --lineages etc.), whether the target rank (--rank) is none or a rank name (e.g., "species"). Despite the difference, the two workflows are mutually identical in logic.

PS: in the program design, the entire classification system is tree-like, with tips as subject IDs. Therefore a classification system without higher hierarchies but only subjects === gOTU.

qiyunzhu avatar Feb 23 '20 18:02 qiyunzhu

Thank you for the explanation; perhaps worth adding this information to the documentation and clearly list all the parameters used for each command.

antgonza avatar Feb 24 '20 13:02 antgonza