polars
polars copied to clipboard
python: print nested datatypes.
I really like the printSchema() function in PySpark as it massively simplifies working with (large) tables:
Having this function in polars as well would aid its usage comfort :)
Something like this?:
from pprint import pprint
pprint(df.schema)
{'dropoff_datetime': <class 'polars.datatypes.Utf8'>,
'dropoff_latitude': <class 'polars.datatypes.Float64'>,
'dropoff_longitude': <class 'polars.datatypes.Float64'>,
'fare_amount': <class 'polars.datatypes.Float64'>,
'mta_tax': <class 'polars.datatypes.Float64'>,
'passenger_count': <class 'polars.datatypes.Int64'>,
'payment_type': <class 'polars.datatypes.Utf8'>,
'pickup_datetime': <class 'polars.datatypes.Utf8'>,
'pickup_latitude': <class 'polars.datatypes.Float64'>,
'pickup_longitude': <class 'polars.datatypes.Float64'>,
'rate_code': <class 'polars.datatypes.Int64'>,
'store_and_fwd_flag': <class 'polars.datatypes.Int64'>,
'surcharge': <class 'polars.datatypes.Float64'>,
'tip_amount': <class 'polars.datatypes.Float64'>,
'tolls_amount': <class 'polars.datatypes.Float64'>,
'total_amount': <class 'polars.datatypes.Float64'>,
'trip_distance': <class 'polars.datatypes.Float64'>,
'vendor_id': <class 'polars.datatypes.Utf8'>}
Yes, but the difference is that this one does not show nested structures:
Right, so we should improve the print of our nested structures!
Right, so we should improve the print of our nested structures!
The doctest that I fixed recently had the name of the fields for structs in the old output. So it seems that at least for structs it was once implemented.
I will take a look at this and see how to implement the nested dtypes on the rust side as suggested.
Would this be a good starting point for the dtypes in general: ?:
https://github.com/pola-rs/polars/blob/9344555b3133626b56a5bf6e207d347f9badbd8e/polars/polars-core/src/datatypes/mod.rs
I think this issue can be closed.
When we run df.schema
on recent polars, we get a description of the struct fields:
{'name': Utf8, 'car': Struct([Field('color', Utf8), Field('year', Int64)])}
(It was noticed by @phillyphil91 in https://github.com/pola-rs/polars/pull/4213#issuecomment-1367542011)