Orson Peters
Orson Peters
@daviskirk Categoricals are currently the only thing that are simply broken on the new streaming engine (to my knowledge). Almost all unsupported things automatically fall back to the eager engine...
@gdementen Note that the new streaming engine as it is today can already run a lot of queries on (much) larger datasets than memory. The queries that aren't out of...
@vultix Maybe eventually, but not anytime soon.
@velochy They should be fixed right now - internally we added a workaround that forces all categoricals to go through the global key map. If you find any bugs please...
@velochy For the time being you have to use `collect(new_streaming=True)`, not `streaming=True`. And `explain` is not yet implemented for `new_streaming`, currently you have to specify the env variable `POLARS_VISUALIZE_PHYSICAL_PLAN="somefile.dot"` and...
@velochy These questions don't belong in this tracking issue, sorry. I'd suggest asking in the Discord server to see if anyone could look with you.
> Understood. Should I delete them to clean the thread? No need, I marked them as off-topic. > Will `unpivot` be supported as a streaming op? `unpivot` is planned, yes,...
This proposal has a **huge** problem which `nan_counts` does not have: NaNs in the dataset can poison the statistics. If the dataset contains both a signed positive NaN and a...
There is another issue with this proposal in my opinion: it adds semantics to the sign bit of `NaN`s. This is incredibly dangerous, not all data systems (e.g. Polars, but...
> Are we willing to special case float to make filtering in the presence of NaNs more efficient, or do we go with a more streamlined implementation without special fields...