Gert Hulselmans

Results 446 comments of Gert Hulselmans

Newer http libraries hopefully also support files bigger than 2GiB/4GiB. See: https://github.com/AppImage/zsync2/issues/31

@probonopd libcurl is not the problem. It is the wrapper code around it, that does not handle big files.

This might help a bit (no alphabetical ordering as far as I can see): https://pypi.org/project/flake8-class-attributes-order/

You can easily write it yourself as an expression: ```python df.with_column( pl.when(pl.col("a").is_null()) .then(pl.col("b")) .otherwise(pl.col("a")) .alias("combine_first") ) ```

Does the file has quoted fields and columns with a lot of text (with embedded newlines?)? Could you print the dtypes for all columns?

Is reading also slow with: `use_pyarrow=True` (remove `dtype` option).

I think it is due the time columns parsing: ```python In [108]: %%time ...: # test with standard csv reader ...: df1 = pl.read_csv(test_path, infer_schema_length=0, dtypes=fake_schema) ...: ...: CPU times:...

First writing the CSV with polars and then reading it, solves it. ```python df1 = pl.read_csv(test_path, infer_schema_length=0, dtypes=fake_schema) In [156]: df1.write_csv("~/test.written_by_polars.csv") In [155]: %%time ...: df1_csv_from_polars = pl.read_csv("~/test.written_by_polars.csv", infer_schema_length=0, dtypes=fake_schema)...

Not sure if it is useful to have to cover all cases, but multiple NA values for a specific column is not supported. ``` Dict[str, List[str]] -> A dictionary that...

I remember that numpy used to take around 1 second to import in the past (which is quite noticeble when you just want to display the help of a script)....