tableone
tableone copied to clipboard
The order of categorical variables
Hello.
I’d like to suggest you to allow for categorical variables be ordered in TableOne.
For example:
Suppose I have a variable that can assume values: “<10”,”10-20”,”>20”
I’d like to see it on TableOne in exactly order above.
But, instead of that, it seems to assume an alphabetic order like ”10-20”,”<10”,”<20”.
It would be usefull to see the correctly order for that.
Also, if that isn't an order for a categorical variable, it should be ordered by the percentual of each category, don't you agree?
Thanks for your attention.
Best regards, Lunna
Thanks for picking this up. Version 0.7.5 now respects the order of categorical variables. For example:
import pandas as pd
from tableone import TableOne
day_cat = pd.Categorical(["mon", "wed", "tue", "thu"],
categories=["wed", "thu", "mon", "tue"], ordered=True)
alph_cat = pd.Categorical(["a", "b", "c", "a"],
categories=["b", "c", "d", "a"], ordered=False)
mon_cat = pd.Categorical(["jan", "feb", "mar", "apr"],
categories=["feb", "jan", "mar", "apr"], ordered=True)
data = pd.DataFrame({"A": ["a", "b", "c", "a"]})
data["day"] = day_cat
data["alph"] = alph_cat
data["month"] = mon_cat
data
Input DataFrame.
Note that the order specified in the DataFrame for day is ["wed", "thu", "mon", "tue"]
and the order for month is: ["feb", "jan", "mar", "apr"]
.
# the categorical order reflects the order in the DataFrame
t1 = TableOne(data, label_suffix=False)
t1
Table 1 uses the order specified in the DataFrame.
The order of day and month is retained in Table 1:
The order
argument overrides the natural order of the DataFrame
The order in the DataFrame may not be what we want. We can either modify the order in the DataFrame directly, or alternatively we can use the order
argument to fix it. If the order
argument is provided, it overrides the order in the dataframe.
new_order = {"month": ["jan"], "day": ["mon", "tue", "wed"]}
t2 = TableOne(data, order=new_order, label_suffix=False)
t2
@epimedplotly please test the sorting if you have the opportunity (the latest version can be pip/conda installed) and let us know if it works as expected.
Also, if that isn't an order for a categorical variable, it should be ordered by the percentual of each category, don't you agree?
Sounds reasonable to me. We haven't implemented this yet, but can look into it.
Hello, @tompollard !
I finally had the opportunity to test the latest version of tableone.
It is really working how I expected, thank you so much!
I reiterate that if there isn't an order for a categorical variable it would be awesome if it could be ordered by the percentual of each category, but the order argument is already making my life easier.
Thanks!
thanks @epimedplotly, glad to hear this helps :)
I reiterate that if there isn't an order for a categorical variable it would be awesome if it could be ordered by the percentual of each category, but the order argument is already making my life easier.
Point taken, and let's keep this issue open for now.
If you come up with new bugs, suggestions, etc, please feel free to raise more issues.
I reiterate that if there isn't an order for a categorical variable it would be awesome if it could be ordered by the percentual of each category, but the order argument is already making my life easier.
I'd also be interested in this functionality. It looks like it's already implemented in the limit
parameter? Would adding a new parameter be the way to do this? I'm willing to take a stab if someone can provide me with some design direction!
https://github.com/tompollard/tableone/blob/bfd6fbaa4ed3e9f59e1a75191c6296a2a80ccc64/tableone/tableone.py#L1501-L1516