yohplala issues

Results 21 issues of


                                            yohplala

Faster row-wise filtering when filter is '<', '=<', '>=', '>' and 'in' ?

When doing a row-wise filtering (not row-group-wise filtering) with filters '>', '>=', '=', '

Record row group size (in memory) and re-use if available in `iter_row_group`

Hi, Do you think we could record in the metadata the size of the row groups when recording them? This data can be obtained when dataframe is split in row_group...

ENH: Load the full data (all columns, categories, partitions) with 'read_row_group_file'

When using `read_row_group_file()`, - it is compulsory to provide columns and categories when using `read_row_group_file`. - also, the way to specify loading all partitions, or only some of them is...

int dtype in a categorical column is lost when used as partition

**What happened**: Test case broken with new version of fastparquet. **Minimal Complete Verifiable Example**: Here is an example showing the trouble: ```python import os import pandas as pd import fastparquet...

Bug in case of a specific set of parameters in 'write': compression/partition_on with str values

Hi, Configuration for this bug to happen have been difficult to formalize, it took me a while. So I notice that `filters` in `to_pandas` is ineffective in case the parquet...

Cannot infer type for <class 'NoneType'> whith a DataFrame with PeriodIndex

Recording a DataFrame with a `pandas.DatetimeIndex` works. ```python import pandas as pd import fastparquet import os path = os.path.expanduser('~/Documents/code/draft/data/') file = path + 'weather_data' datetime_index = pd.date_range(start = pd.Timestamp('2020/01/02 01:00:00'),...

yohplala

Faster row-wise filtering when filter is '<', '=<', '>=', '>' and 'in' ?

Record row group size (in memory) and re-use if available in `iter_row_group`

ENH: Load the full data (all columns, categories, partitions) with 'read_row_group_file'

int dtype in a categorical column is lost when used as partition

Bug in case of a specific set of parameters in 'write': compression/partition_on with str values

Cannot infer type for <class 'NoneType'> whith a DataFrame with PeriodIndex

Edge case with row filtering and null values.

Bug when row-filtering with null values?

Fix categorical data handling with global dictionaries

Linting in fastparquet?