tableone icon indicating copy to clipboard operation
tableone copied to clipboard

Unable to treat variable as continuous measure

Open sgummidipundi opened this issue 3 years ago • 1 comments

Hello! Would just like to say fantastic package and great syntax for the function.

I seem to be having an issue with creating a table with continuous values. I'm sure I am probably doing something incorrectly on my end since it is basic functionality. When I try to do an easy example with a single continuous variable I get an output like below:

image

It is odd because clearly it is reading it as non-normal as I have specified (as indicated by the 'median [Q1, Q3]) but it seems to only give counts and frequencies, essentially treating it as categorical. I have also verified that the variable is of type float64. Is there any suggestions on how I can proceed and have it treat it as a continuous measure?

Thanks in advance

sgummidipundi avatar Dec 27 '20 04:12 sgummidipundi

Hi @sgummidipundi, you've raised a good point, which is that there is no "continuous" argument. At the moment, tableone expects you to define the categorical variables using the "categorical" argument. Anything else is then treated as continuous. I can see how this is confusing, especially when (as in your case) there are no categorical variables.

If you don't specify which variables are categorical, then then tableone attempts to guess (and, from your example, clearly doesn't do a great job!). In your example, you would need to provide an empty categorical argument. I've tried to recreate the example below:

1. Generate sample data

# import packages
import pandas as pd
import tableone
# create sample dataframe
x = ([0.0] * 41639 + 
     [0.2] * 3 +
     [0.25] * 1 +
     [1] * 3 +
     [10] * 806 +
     [100] * 816 +
     [1000] * 1488 +
     [10000] * 57 +
     [100000] * 3 +
     [11000] * 2 +
     [117000] * 7 +
     [12] * 1 +
     [1200] * 267 +
     [12000] * 51)

data = pd.DataFrame(x, columns=["x"])

2. Create summary table, allowing tableone to guess the data type

Based on the large number of observations and the limited number of unique values, tableone (incorrectly!) guesses that x is categorical

t1 = tableone.tableone(data)
print(t1.tabulate(tablefmt = "github"))
Missing Overall
n 45144
x, n (%) 0.0 0 41639 (92.2)
0.2 3 (0.0)
0.25 1 (0.0)
1.0 3 (0.0)
10.0 806 (1.8)
100.0 816 (1.8)
1000.0 1488 (3.3)
10000.0 57 (0.1)
100000.0 3 (0.0)
11000.0 2 (0.0)
117000.0 7 (0.0)
12.0 1 (0.0)
1200.0 267 (0.6)
12000.0 51 (0.1)

3. Create summary table with the categorical argument

t2 = tableone.tableone(data, categorical=[])
print(t2.tabulate(tablefmt = "github"))
Missing Overall
n 45144
x, mean (SD) 0 93.5 (1764.8)

tompollard avatar Dec 30 '20 18:12 tompollard