Ritchie Vink comments

Results 1046 comments of


                                            Ritchie Vink

trafficstars

CSV: build categoricals directly

Could it be that pyarrow convert the categorical upon reading? Whereas we first read as string column and then convert.

CSV: build categoricals directly

Yeap, that makes more sense to me as the local builders seem pretty optimized.

CSV: build categoricals directly

Yes, you need to hash those strings and store them in a hashmap. That's expensive. ```python >>> %%time >>> df["pickup_datetime"].cast(pl.Categorical) CPU times: user 1.91 s, sys: 162 ms, total: 2.08...

CSV: build categoricals directly

Was closed by the wrong PR. > Global string cache is way faster now for the case above (after https://github.com/pola-rs/polars/pull/4087): Wow there is almost no overhead of the global string...

Apply not being optimized when using LazyFrame [Python]

#3313 only fixed the first function. I still need to do the latest.

[Python] Expose all operators Expr implements as methods

This would create a redundancy and would create differences in how users would write polars queries, which I want to keep to a minimum. I think I will even follow...

[Python] Expose all operators Expr implements as methods

> I dont really think that writing queries different ways would be more of an issue than it already is, after all i can already use the dunders directly to...

[Python] Expose all operators Expr implements as methods

Given the many requests for this, I am willing to accept a PR that implements those on the expressions.

python: print nested datatypes.

Something like this?: ```python from pprint import pprint pprint(df.schema) ``` ``` {'dropoff_datetime': , 'dropoff_latitude': , 'dropoff_longitude': , 'fare_amount': , 'mta_tax': , 'passenger_count': , 'payment_type': , 'pickup_datetime': , 'pickup_latitude': , 'pickup_longitude':...

python: print nested datatypes.

Right, so we should improve the print of our nested structures!