polars icon indicating copy to clipboard operation
polars copied to clipboard

Select numerical columns

Open Gabriel-ROBIN opened this issue 2 years ago • 2 comments

Problem description

I wish I could select only numerical columns in polars.

The equivalent function in pandas is select_dtypes('number')

This is the work around ( but has the drawback to be verbose and to change the order of the columns ! ):

import polars as pl

df = pl.DataFrame({
    'A': pl.Series([1,2, -3], dtype=pl.Int32), 
    'B': pl.Series([0.2,0.5,1.5], dtype=pl.Float64), 
    'C': pl.Series([0.21564,0.51,1.55], dtype=pl.Float32), 
    'D': pl.Series([1,2,3], dtype=pl.UInt64), 
    'E': ['a', 'b', 'c']
})

NUMERIC_POLARS_DTYPES = [
    pl.Int8, pl.Int16, pl.Int32, pl.Int64, 
    pl.UInt8, pl.UInt16, pl.UInt32, pl.UInt64,
    pl.Float32, pl.Float64, 
]

df.select(pl.col(NUMERIC_POLARS_DTYPES)).columns
# ['A', 'D', 'C', 'B']

Gabriel-ROBIN avatar Nov 26 '22 15:11 Gabriel-ROBIN

Just a note that if you do:

import polars as pl

df = pl.DataFrame({
    'A': pl.Series([1,2, -3], dtype=pl.Int32), 
    'B': pl.Series([0.2,0.5,1.5], dtype=pl.Float64), 
    'C': pl.Series([0.21564,0.51,1.55], dtype=pl.Float32), 
    'D': pl.Series([1,2,3], dtype=pl.UInt64), 
    'E': ['a', 'b', 'c']
})

NUMERIC_POLARS_DTYPES = [
    pl.Int8, pl.Int16, pl.Int32, pl.Int64, 
    pl.UInt8, pl.UInt16, pl.UInt32, pl.UInt64,
    pl.Float32, pl.Float64, 
]

number_columns = pl.col(NUMERIC_POLARS_DTYPES)

You can then do


df.select(number_columns)

Not 100% sure it could be cleaner without enforcing a canonical list of "number" data types

mkleinbort-ic avatar Nov 30 '22 10:11 mkleinbort-ic

Just to drive that last point home... should bool be in that list of number types? Within my codebase, probably yes, within yours maybe not...

mkleinbort-ic avatar Nov 30 '22 10:11 mkleinbort-ic