Gert Hulselmans comments

Results 446 comments of


                                            Gert Hulselmans

CSV: build categoricals directly

Still quite surprised it takes so long to create categoricals, basically longer than reading the whole file.

CSV: build categoricals directly

Global string cache is way faster now for the case above (after #4087): ```python In [4]: %time df_pl = read_fragments_from_file(fragments_bed_filename, engine='polars') CPU times: user 53.7 s, sys: 8.24 s, total:...

CSV: build categoricals directly

> Wow there is almost no overhead of the global string cache in your use case! tada Thanks for the great improvement. It is getting to acceptable levels :-P. A...

CSV: build categoricals directly

Building categoricals in CSV reader was implemented in: https://github.com/pola-rs/polars/pull/4933

python: print nested datatypes.

> Right, so we should improve the print of our nested structures! The doctest that I fixed recently had the name of the fields for structs in the old output....

feat(python): Display struct field's name and dtype

The full struct printing could be toggled by a pl.Config option similar to: `pl.Config.set_tbl_hide_column_data_types()` and other table related settings.

feat(python): Display struct field's name and dtype

The struct fields shouldn't be displayed in the schema, only in the printed dataframe. The schema is a python dictionary, the new formatting breaks this (and would also break code...

Chinese problem

Your input file is not in UTF-8 encoding, but a different one. Opening your file with WPS and saving it again as a CSV file encodes the text properly as...

Chinese problem

With the following iconv command, I managed to convert the CSV to UTF-8 format too. ```bash iconv -f BIG5 -t UTF-8 test_original.csv > test_iconv.csv ``` It is possible that you...

Chinese problem

This will also work: ```python In [67]: with open('test_original.csv', 'r', encoding='big5') as fh: ...: df = pl.read_csv(fh.read().encode('utf-8')) ...: In [68]: df Out[68]: shape: (5, 5) ┌────────┬────────┬────────┬────────┬────────┐ │ Value1 ┆ Value2...