polars
polars copied to clipboard
Select numerical columns
Problem description
I wish I could select only numerical columns in polars.
The equivalent function in pandas is select_dtypes('number')
This is the work around ( but has the drawback to be verbose and to change the order of the columns ! ):
import polars as pl
df = pl.DataFrame({
'A': pl.Series([1,2, -3], dtype=pl.Int32),
'B': pl.Series([0.2,0.5,1.5], dtype=pl.Float64),
'C': pl.Series([0.21564,0.51,1.55], dtype=pl.Float32),
'D': pl.Series([1,2,3], dtype=pl.UInt64),
'E': ['a', 'b', 'c']
})
NUMERIC_POLARS_DTYPES = [
pl.Int8, pl.Int16, pl.Int32, pl.Int64,
pl.UInt8, pl.UInt16, pl.UInt32, pl.UInt64,
pl.Float32, pl.Float64,
]
df.select(pl.col(NUMERIC_POLARS_DTYPES)).columns
# ['A', 'D', 'C', 'B']
Just a note that if you do:
import polars as pl
df = pl.DataFrame({
'A': pl.Series([1,2, -3], dtype=pl.Int32),
'B': pl.Series([0.2,0.5,1.5], dtype=pl.Float64),
'C': pl.Series([0.21564,0.51,1.55], dtype=pl.Float32),
'D': pl.Series([1,2,3], dtype=pl.UInt64),
'E': ['a', 'b', 'c']
})
NUMERIC_POLARS_DTYPES = [
pl.Int8, pl.Int16, pl.Int32, pl.Int64,
pl.UInt8, pl.UInt16, pl.UInt32, pl.UInt64,
pl.Float32, pl.Float64,
]
number_columns = pl.col(NUMERIC_POLARS_DTYPES)
You can then do
df.select(number_columns)
Not 100% sure it could be cleaner without enforcing a canonical list of "number" data types
Just to drive that last point home... should bool
be in that list of number types? Within my codebase, probably yes, within yours maybe not...