lefser icon indicating copy to clipboard operation
lefser copied to clipboard

Clarify what `blockCol` is

Open lwaldron opened this issue 1 year ago • 2 comments

@asyakhl what is the historical reason for the naming of the lefser() blockCol argument? This name, and the help for the argument:

character(1) Optional column name in colData(relab) indicating the blocks, usually a factor with two levels (e.g., c("adult", "senior"); default NULL).

implies that it is a blocking variable (ie https://en.wikipedia.org/wiki/Blocking_(statistics) ). However, in LEfSe, the two grouping variables define main groups and subgroups for pairwise comparisons, not a blocking variable. Also, why would subgoups usually have two levels?

I think this causes confusion by users trying to use it to define blocking variables (ie, see `blockCol = "patient" in #47). I'm hesitant to rename the variable and break existing code, but we should make it much clearer in documentation how this argument is used and that it doesn't refer to a blocking variable in the statistical sense.

lwaldron avatar Jul 02 '24 09:07 lwaldron

By the way, I noted that the original Python LEfSe functions and manuscript refer to "class" and "subclass", not "group" and "block" (e.g. https://huttenhower.sph.harvard.edu/lefse/ for galaxy module and https://github.com/SegataLab/lefse/blob/master/lefse/lefse_run.py for Python program). This may be worth going through a deprecation cycle for to avoid confusion.

lwaldron avatar Jul 02 '24 10:07 lwaldron

@lwaldron I don't remember the historical reason for this kind of naming. We should deprecate groupCol and blockCol and change it to class and subclass to avoid confusion with blocking variable. There can be more than two levels of subgroups! @shbrief I started PR to clarify man for blockCol.

asyakhl avatar Aug 04 '24 02:08 asyakhl

5d4e2e205f43f9bad9732436aa3b6020c69ed34f

shbrief avatar Oct 05 '24 21:10 shbrief