Stijn de Gooijer comments

Results 620 comments of


                                            Stijn de Gooijer

[Protocol] Make Column.get_buffers() docstring more explicit

> updating all libraries' from_dataframe function to handle both ways of specifying the buffers' dtypes? Implementations of `from_dataframe` should just disregard the data buffer dtype entirely. `column.dtype` already tells you...

[Protocol] Make Column.get_buffers() docstring more explicit

> > Implementations of `from_dataframe` should just disregard the data buffer dtype entirely. `column.dtype` already tells you what to expect in the data buffer (e.g. dtype `STRING` will mean an...

[Protocol] Make Column.get_buffers() docstring more explicit

> I think we should still take some care though For sure! Let's first get the `from_dataframe` implementations fixed, then we can update the data buffer dtype whenever we feel...

[Protocol] Make Column.get_buffers() docstring more explicit

> Well, if you have a DATETIME column, for example, what is the implied dtype for the data buffer? It might be spelled out in the spec, but I'm certainly...

[Protocol] Make Column.get_buffers() docstring more explicit

> > Implementations of `from_dataframe` should just disregard the data buffer dtype entirely. `column.dtype` already tells you what to expect in the data buffer > > Thinking further on this...

Support of bit vs byte-packed boolean

When implementing this for Polars, I had some trouble wrapping my head around the `offset` on Columns and `bufsize` on Buffers, specifically when it comes to bitmasks. You can figure...

fix(python): Include pl. qualifier for inner dtypes in to_init_repr()

It looks good, but the Array repr has been updated. I'll send an update and this can be merged.

docs: Add an overview of available SQL functions

Was this closed by https://github.com/pola-rs/polars/pull/16268?

docs(python): Overview of available SQL functions

@r-brink Did you mean to keep this on draft or is this ready to be reviewed?

Polars to_numpy slower with chunked array than going via pandas

Do you see the same results if you run `df.to_numpy(use_pyarrow=False)`? PyArrow is still the default engine, and it rechunks when converting to PyArrow.