polars icon indicating copy to clipboard operation
polars copied to clipboard

python: print nested datatypes.

Open Hoeze opened this issue 2 years ago • 5 comments

I really like the printSchema() function in PySpark as it massively simplifies working with (large) tables: grafik Having this function in polars as well would aid its usage comfort :)

Hoeze avatar Jul 07 '22 09:07 Hoeze

Something like this?:

from pprint import pprint

pprint(df.schema)
{'dropoff_datetime': <class 'polars.datatypes.Utf8'>,
 'dropoff_latitude': <class 'polars.datatypes.Float64'>,
 'dropoff_longitude': <class 'polars.datatypes.Float64'>,
 'fare_amount': <class 'polars.datatypes.Float64'>,
 'mta_tax': <class 'polars.datatypes.Float64'>,
 'passenger_count': <class 'polars.datatypes.Int64'>,
 'payment_type': <class 'polars.datatypes.Utf8'>,
 'pickup_datetime': <class 'polars.datatypes.Utf8'>,
 'pickup_latitude': <class 'polars.datatypes.Float64'>,
 'pickup_longitude': <class 'polars.datatypes.Float64'>,
 'rate_code': <class 'polars.datatypes.Int64'>,
 'store_and_fwd_flag': <class 'polars.datatypes.Int64'>,
 'surcharge': <class 'polars.datatypes.Float64'>,
 'tip_amount': <class 'polars.datatypes.Float64'>,
 'tolls_amount': <class 'polars.datatypes.Float64'>,
 'total_amount': <class 'polars.datatypes.Float64'>,
 'trip_distance': <class 'polars.datatypes.Float64'>,
 'vendor_id': <class 'polars.datatypes.Utf8'>}

ritchie46 avatar Jul 07 '22 09:07 ritchie46

Yes, but the difference is that this one does not show nested structures: grafik

Hoeze avatar Jul 07 '22 09:07 Hoeze

Right, so we should improve the print of our nested structures!

ritchie46 avatar Jul 07 '22 09:07 ritchie46

Right, so we should improve the print of our nested structures!

The doctest that I fixed recently had the name of the fields for structs in the old output. So it seems that at least for structs it was once implemented.

ghuls avatar Jul 07 '22 21:07 ghuls

I will take a look at this and see how to implement the nested dtypes on the rust side as suggested.

Would this be a good starting point for the dtypes in general: ?:

https://github.com/pola-rs/polars/blob/9344555b3133626b56a5bf6e207d347f9badbd8e/polars/polars-core/src/datatypes/mod.rs

phillyphil91 avatar Jul 14 '22 20:07 phillyphil91

I think this issue can be closed.


When we run df.schema on recent polars, we get a description of the struct fields:

{'name': Utf8, 'car': Struct([Field('color', Utf8), Field('year', Int64)])}

(It was noticed by @phillyphil91 in https://github.com/pola-rs/polars/pull/4213#issuecomment-1367542011)

mslapek avatar Feb 11 '23 18:02 mslapek