datatable issues

fread string with NAs generates extra distinct group

```sh wget https://raw.githubusercontent.com/h2oai/db-benchmark/cf255c174647ac437aa7a85751f6e65732a3cb9a/_data/groupby-datagen.R Rscript groupby-datagen.R 1e9 1e2 5 0 ## activate your pydt env source ~/git/db-benchmark/pydatatable/py-pydatatable/bin/activate python import datatable as dt from datatable import f, count x = dt.fread('G1_1e9_1e2_5_0.csv', na_strings=[''])...

jangorecki

bug

fread

sort na_position="remove" crashes

- Did you find a bug in datatable, or maybe the bug found you? I found a bug. - How to reproduce the bug? ```sh wget https://raw.githubusercontent.com/h2oai/db-benchmark/cf255c174647ac437aa7a85751f6e65732a3cb9a/_data/groupby-datagen.R Rscript groupby-datagen.R 1e8...

jangorecki

segfault

sort

import/export pandas frame with NA-aware data types

2

dt.Frame is raising an error while trying to import pandas frame where columns are of `Int32` so that they can have a missing value. ```py import pandas as pd import...

jangorecki

new feature

force lazy evaluation

1

This is follow up of our slack conversation. FR is about providing API that allows to force materialize results of computations which might have not been materialized yet, simply because...

jangorecki

fread should distinguish between fill set explicitly or not set

7

Currently the default value for `fill` is `False`. However when `sep = ' '` we change `fill` to `True`. This shouldn't be the case if the user asked `fill=False` explicitly....

st-pasha

improve

fread

Use protocol 5 for pickling

In py3.8 protocol version 5 was added for pickling, which allows avoiding excessive memory copies of serialized objects. We should make use of this feature for faster inter-process data exchange....

st-pasha

improve

Loss of column Name

8

- Did you find a bug in datatable, or maybe the bug found you? Loss of column names during some operations. What determines how a column name is changed? What...

samukweku

documentation

fread erroneously guesses sep=' '

8

https://archive.ics.uci.edu/ml/machine-learning-databases/badges/badges.data Seems like reasonable data set that needs better white space detection -- similar to datetime, here is firstnameinitiallastname and dt gets confused when name format changes slightly. As you...

pseudotensor

bug

improve

fread

low priority

xread: an "extract+read"

1

This is a proposal for implementing a new function `xread()`, which would be conceptually similar to `fread()`, but much lazier. In particular, `xread()` would parse only the first `n_sample_lines=100` lines...

st-pasha

fread

design-doc

cust-goldmansachs

Rolling aggregate support based on windows within a DT

5

I'd like to see the ability to get different rolling aggregations of my dataset based on order and grouping columns. Pandas has robust support for these type of actions. https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.rolling.html....

atroiano

new feature

cust-goldmansachs

datatable
datatable copied to clipboard

Metadata

fread string with NAs generates extra distinct group

sort na_position="remove" crashes

import/export pandas frame with NA-aware data types

force lazy evaluation

fread should distinguish between fill set explicitly or not set

Use protocol 5 for pickling

Loss of column Name

fread erroneously guesses sep=' '

xread: an "extract+read"

Rolling aggregate support based on windows within a DT

← Metadata

Owner

Metadata

datatable datatable copied to clipboard

Metadata

← Metadata

Owner

Metadata

datatable
datatable copied to clipboard