datatable
datatable copied to clipboard
A Python package for manipulating 2-dimensional tabular data structures
```sh wget https://raw.githubusercontent.com/h2oai/db-benchmark/cf255c174647ac437aa7a85751f6e65732a3cb9a/_data/groupby-datagen.R Rscript groupby-datagen.R 1e9 1e2 5 0 ## activate your pydt env source ~/git/db-benchmark/pydatatable/py-pydatatable/bin/activate python import datatable as dt from datatable import f, count x = dt.fread('G1_1e9_1e2_5_0.csv', na_strings=[''])...
- Did you find a bug in datatable, or maybe the bug found you? I found a bug. - How to reproduce the bug? ```sh wget https://raw.githubusercontent.com/h2oai/db-benchmark/cf255c174647ac437aa7a85751f6e65732a3cb9a/_data/groupby-datagen.R Rscript groupby-datagen.R 1e8...
dt.Frame is raising an error while trying to import pandas frame where columns are of `Int32` so that they can have a missing value. ```py import pandas as pd import...
This is follow up of our slack conversation. FR is about providing API that allows to force materialize results of computations which might have not been materialized yet, simply because...
Currently the default value for `fill` is `False`. However when `sep = ' '` we change `fill` to `True`. This shouldn't be the case if the user asked `fill=False` explicitly....
In py3.8 protocol version 5 was added for pickling, which allows avoiding excessive memory copies of serialized objects. We should make use of this feature for faster inter-process data exchange....
- Did you find a bug in datatable, or maybe the bug found you? Loss of column names during some operations. What determines how a column name is changed? What...
https://archive.ics.uci.edu/ml/machine-learning-databases/badges/badges.data Seems like reasonable data set that needs better white space detection -- similar to datetime, here is firstnameinitiallastname and dt gets confused when name format changes slightly. As you...
This is a proposal for implementing a new function `xread()`, which would be conceptually similar to `fread()`, but much lazier. In particular, `xread()` would parse only the first `n_sample_lines=100` lines...
I'd like to see the ability to get different rolling aggregations of my dataset based on order and grouping columns. Pandas has robust support for these type of actions. https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.rolling.html....