polars icon indicating copy to clipboard operation
polars copied to clipboard

feat(python): Display struct field's name and dtype

Open phillyphil91 opened this issue 2 years ago • 8 comments

Related #3925 I know there is a lot missing here, but I wanted to start with the rust side first. As there is already a good idea how modify the schema method in python: #3953

This displays the structs field's name and dtype in a simple way, but formatting will fail for structs with further structs as fields. Display follows pyspark style from example in #3925

Not sure if this is the best way going forward, as the result in python would look as follows (also check python test checks): image

Instead of: image

Hence all doctest examples would have to be adjusted. I can do that, but wanted to make sure this approach goes into the right direction.

phillyphil91 avatar Aug 01 '22 18:08 phillyphil91

Codecov Report

Merging #4213 (cfd9409) into master (3e665fd) will decrease coverage by 14.64%. The diff coverage is 98.80%.

@@             Coverage Diff             @@
##           master    #4213       +/-   ##
===========================================
- Coverage   78.76%   64.11%   -14.65%     
===========================================
  Files         458      457        -1     
  Lines       75785    75616      -169     
===========================================
- Hits        59691    48484    -11207     
- Misses      16094    27132    +11038     
Impacted Files Coverage Δ
polars/polars-core/src/frame/mod.rs 62.90% <25.00%> (-14.49%) :arrow_down:
py-polars/polars/io.py 73.93% <87.50%> (+1.11%) :arrow_up:
...olars/polars-core/src/chunked_array/ops/explode.rs 59.88% <100.00%> (-31.72%) :arrow_down:
polars/polars-core/src/datatypes/mod.rs 51.00% <100.00%> (-21.40%) :arrow_down:
polars/polars-core/src/frame/groupby/proxy.rs 59.47% <100.00%> (-7.19%) :arrow_down:
polars/polars-core/src/utils/mod.rs 82.70% <100.00%> (+21.27%) :arrow_up:
...s-lazy/src/logical_plan/optimizer/type_coercion.rs 80.15% <100.00%> (-2.01%) :arrow_down:
...olars-lazy/src/physical_plan/expressions/window.rs 78.57% <100.00%> (+4.32%) :arrow_up:
py-polars/polars/internals/io.py 76.66% <100.00%> (ø)
polars/polars-io/src/tests.rs 0.00% <0.00%> (-100.00%) :arrow_down:
... and 225 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update fad1c77...cfd9409. Read the comment docs.

codecov-commenter avatar Aug 01 '22 18:08 codecov-commenter

Hey @ritchie46 , any comment or suggestions on this? I think this might be mostly a stylistic change. But could also have some drastic changes when you print the schema. I would be happy to change the output to a desired format

phillyphil91 avatar Sep 02 '22 12:09 phillyphil91

The full struct printing could be toggled by a pl.Config option similar to: pl.Config.set_tbl_hide_column_data_types() and other table related settings.

ghuls avatar Oct 21 '22 21:10 ghuls

thanks for the reply @ghuls. I will take a look at how to add this to pl.Config and re-open the PR when i have changed rebased on the current state of polars as well

phillyphil91 avatar Oct 23 '22 20:10 phillyphil91

Hey @ghuls I have updated the code to allow a config flag to be set on the python side for the extensive displaying of the struct. Is that what you were thinking? I have tested it and it works with the config flag. Without it, it displays the dtype the old way. Should there be any tests specifically for that? I have not seen any other tests for displaying/ printing the other dtypes, if i have seen correctly.

A couple of notes: With the new way of displaying it, it no longer gives the correct datatype of polars.datatypes.Utf8 for example but rather:

image

Also for nested structs, i.e. structs that contain structs this display breaks: image

I'm not really sure how to fix this to be honest, as I don't know how to keep track of the level of nesting to increase the indentation or something in the fmt function.

Let me know if you have any thoughts on this or can suggest something.

phillyphil91 avatar Oct 25 '22 20:10 phillyphil91

Hey @ghuls any update or feedback on this? I'm not sure if the new formatting makes sense, but if we can agree on something than I could move forward with this.

phillyphil91 avatar Nov 05 '22 16:11 phillyphil91

The struct fields shouldn't be displayed in the schema, only in the printed dataframe. The schema is a python dictionary, the new formatting breaks this (and would also break code that manipulates the schema).

ghuls avatar Nov 05 '22 21:11 ghuls

is this still needed @ghuls ? I saw that df.schema basically displays the struct's fields.

phillyphil91 avatar Dec 29 '22 19:12 phillyphil91