Gert Hulselmans

Results 446 comments of Gert Hulselmans

Still quite surprised it takes so long to create categoricals, basically longer than reading the whole file.

Global string cache is way faster now for the case above (after #4087): ```python In [4]: %time df_pl = read_fragments_from_file(fragments_bed_filename, engine='polars') CPU times: user 53.7 s, sys: 8.24 s, total:...

> Wow there is almost no overhead of the global string cache in your use case! tada Thanks for the great improvement. It is getting to acceptable levels :-P. A...

Building categoricals in CSV reader was implemented in: https://github.com/pola-rs/polars/pull/4933

> Right, so we should improve the print of our nested structures! The doctest that I fixed recently had the name of the fields for structs in the old output....

The full struct printing could be toggled by a pl.Config option similar to: `pl.Config.set_tbl_hide_column_data_types()` and other table related settings.

The struct fields shouldn't be displayed in the schema, only in the printed dataframe. The schema is a python dictionary, the new formatting breaks this (and would also break code...

Your input file is not in UTF-8 encoding, but a different one. Opening your file with WPS and saving it again as a CSV file encodes the text properly as...

With the following iconv command, I managed to convert the CSV to UTF-8 format too. ```bash iconv -f BIG5 -t UTF-8 test_original.csv > test_iconv.csv ``` It is possible that you...

This will also work: ```python In [67]: with open('test_original.csv', 'r', encoding='big5') as fh: ...: df = pl.read_csv(fh.read().encode('utf-8')) ...: In [68]: df Out[68]: shape: (5, 5) ┌────────┬────────┬────────┬────────┬────────┐ │ Value1 ┆ Value2...