tableone icon indicating copy to clipboard operation
tableone copied to clipboard

The order of categorical variables

Open epimedplotly opened this issue 4 years ago • 5 comments

Hello.

I’d like to suggest you to allow for categorical variables be ordered in TableOne.

For example:

Suppose I have a variable that can assume values: “<10”,”10-20”,”>20”

I’d like to see it on TableOne in exactly order above.

But, instead of that, it seems to assume an alphabetic order like ”10-20”,”<10”,”<20”.

It would be usefull to see the correctly order for that.

Also, if that isn't an order for a categorical variable, it should be ordered by the percentual of each category, don't you agree?

Thanks for your attention.

Best regards, Lunna

epimedplotly avatar Dec 05 '19 13:12 epimedplotly

Thanks for picking this up. Version 0.7.5 now respects the order of categorical variables. For example:

import pandas as pd
from tableone import TableOne

day_cat = pd.Categorical(["mon", "wed", "tue", "thu"],
                         categories=["wed", "thu", "mon", "tue"], ordered=True)

alph_cat = pd.Categorical(["a", "b", "c", "a"],
                         categories=["b", "c", "d", "a"], ordered=False)

mon_cat = pd.Categorical(["jan", "feb", "mar", "apr"],
                         categories=["feb", "jan", "mar", "apr"], ordered=True)

data = pd.DataFrame({"A": ["a", "b", "c", "a"]})
data["day"] = day_cat
data["alph"] = alph_cat
data["month"] = mon_cat
data

Input DataFrame.

Note that the order specified in the DataFrame for day is ["wed", "thu", "mon", "tue"] and the order for month is: ["feb", "jan", "mar", "apr"].

Screen Shot 2020-05-07 at 15 46 47


# the categorical order reflects the order in the DataFrame
t1 = TableOne(data, label_suffix=False)
t1

Table 1 uses the order specified in the DataFrame.

The order of day and month is retained in Table 1:

Screen Shot 2020-05-07 at 15 48 42

The order argument overrides the natural order of the DataFrame

The order in the DataFrame may not be what we want. We can either modify the order in the DataFrame directly, or alternatively we can use the order argument to fix it. If the order argument is provided, it overrides the order in the dataframe.


new_order = {"month": ["jan"], "day": ["mon", "tue", "wed"]}

t2 = TableOne(data, order=new_order, label_suffix=False)
t2

Screen Shot 2020-05-07 at 15 51 25

tompollard avatar May 07 '20 14:05 tompollard

@epimedplotly please test the sorting if you have the opportunity (the latest version can be pip/conda installed) and let us know if it works as expected.

Also, if that isn't an order for a categorical variable, it should be ordered by the percentual of each category, don't you agree?

Sounds reasonable to me. We haven't implemented this yet, but can look into it.

tompollard avatar May 07 '20 20:05 tompollard

Hello, @tompollard !

I finally had the opportunity to test the latest version of tableone.

It is really working how I expected, thank you so much!

I reiterate that if there isn't an order for a categorical variable it would be awesome if it could be ordered by the percentual of each category, but the order argument is already making my life easier.

Thanks!

epimedplotly avatar Jul 10 '20 15:07 epimedplotly

thanks @epimedplotly, glad to hear this helps :)

I reiterate that if there isn't an order for a categorical variable it would be awesome if it could be ordered by the percentual of each category, but the order argument is already making my life easier.

Point taken, and let's keep this issue open for now.

If you come up with new bugs, suggestions, etc, please feel free to raise more issues.

tompollard avatar Jul 10 '20 16:07 tompollard

I reiterate that if there isn't an order for a categorical variable it would be awesome if it could be ordered by the percentual of each category, but the order argument is already making my life easier.

I'd also be interested in this functionality. It looks like it's already implemented in the limit parameter? Would adding a new parameter be the way to do this? I'm willing to take a stab if someone can provide me with some design direction!

https://github.com/tompollard/tableone/blob/bfd6fbaa4ed3e9f59e1a75191c6296a2a80ccc64/tableone/tableone.py#L1501-L1516

vsocrates avatar Feb 15 '24 20:02 vsocrates