polars
polars copied to clipboard
feat(python): Display struct field's name and dtype
Related #3925 I know there is a lot missing here, but I wanted to start with the rust side first. As there is already a good idea how modify the schema method in python: #3953
This displays the structs field's name and dtype in a simple way, but formatting will fail for structs with further structs as fields. Display follows pyspark style from example in #3925
Not sure if this is the best way going forward, as the result in python would look as follows (also check python test checks):
Instead of:
Hence all doctest examples would have to be adjusted. I can do that, but wanted to make sure this approach goes into the right direction.
Codecov Report
Merging #4213 (cfd9409) into master (3e665fd) will decrease coverage by
14.64%
. The diff coverage is98.80%
.
@@ Coverage Diff @@
## master #4213 +/- ##
===========================================
- Coverage 78.76% 64.11% -14.65%
===========================================
Files 458 457 -1
Lines 75785 75616 -169
===========================================
- Hits 59691 48484 -11207
- Misses 16094 27132 +11038
Impacted Files | Coverage Δ | |
---|---|---|
polars/polars-core/src/frame/mod.rs | 62.90% <25.00%> (-14.49%) |
:arrow_down: |
py-polars/polars/io.py | 73.93% <87.50%> (+1.11%) |
:arrow_up: |
...olars/polars-core/src/chunked_array/ops/explode.rs | 59.88% <100.00%> (-31.72%) |
:arrow_down: |
polars/polars-core/src/datatypes/mod.rs | 51.00% <100.00%> (-21.40%) |
:arrow_down: |
polars/polars-core/src/frame/groupby/proxy.rs | 59.47% <100.00%> (-7.19%) |
:arrow_down: |
polars/polars-core/src/utils/mod.rs | 82.70% <100.00%> (+21.27%) |
:arrow_up: |
...s-lazy/src/logical_plan/optimizer/type_coercion.rs | 80.15% <100.00%> (-2.01%) |
:arrow_down: |
...olars-lazy/src/physical_plan/expressions/window.rs | 78.57% <100.00%> (+4.32%) |
:arrow_up: |
py-polars/polars/internals/io.py | 76.66% <100.00%> (ø) |
|
polars/polars-io/src/tests.rs | 0.00% <0.00%> (-100.00%) |
:arrow_down: |
... and 225 more |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update fad1c77...cfd9409. Read the comment docs.
Hey @ritchie46 , any comment or suggestions on this? I think this might be mostly a stylistic change. But could also have some drastic changes when you print the schema. I would be happy to change the output to a desired format
The full struct printing could be toggled by a pl.Config option similar to: pl.Config.set_tbl_hide_column_data_types()
and other table related settings.
thanks for the reply @ghuls. I will take a look at how to add this to pl.Config
and re-open the PR when i have changed rebased on the current state of polars as well
Hey @ghuls I have updated the code to allow a config flag to be set on the python side for the extensive displaying of the struct. Is that what you were thinking? I have tested it and it works with the config flag. Without it, it displays the dtype the old way. Should there be any tests specifically for that? I have not seen any other tests for displaying/ printing the other dtypes, if i have seen correctly.
A couple of notes: With the new way of displaying it, it no longer gives the correct datatype of polars.datatypes.Utf8 for example but rather:
Also for nested structs, i.e. structs that contain structs this display breaks:
I'm not really sure how to fix this to be honest, as I don't know how to keep track of the level of nesting to increase the indentation or something in the fmt
function.
Let me know if you have any thoughts on this or can suggest something.
Hey @ghuls any update or feedback on this? I'm not sure if the new formatting makes sense, but if we can agree on something than I could move forward with this.
The struct fields shouldn't be displayed in the schema, only in the printed dataframe. The schema is a python dictionary, the new formatting breaks this (and would also break code that manipulates the schema).
is this still needed @ghuls ? I saw that df.schema basically displays the struct's fields.