datatable
datatable copied to clipboard
A Python package for manipulating 2-dimensional tabular data structures
I encountered an unexpected behavior of `to_csv()` and `fread()` regarding its handling of 'NA' string. When I ran the following code, ```python import datatable as dt data = dt.Frame(['a', 'NA'])...
A feature which is often used in pandas is `apply` (or `aggregate` or `transform`) that basically allow to do a mapping, aggregation or even a partial reduction operation. [PySpark](https://databricks.com/blog/2017/10/30/introducing-vectorized-udfs-for-pyspark.html) basically...
- `dt.isna()`, `dt.abs()`, `dt.exp()`, `dt.log()`, `dt.log10()` are marked internally as [deprecated](https://github.com/h2oai/datatable/blob/main/src/datatable/expr/math.py#L21-L26) for a long time, so they should be replaced with the corresponding `dt.math.*()` functions; - `dt.len()` should be deprecated...
The structure of the tests should be made approximately the same as the structure of API docs, with a dedicated test file for every function. These test files should then...
The output for statistical reducers when used in the square brackets syntax does not match that of Frame methods. ```python srcs = ["a", "bc", "def", None, -2.5, 3.7] / dt.obj64...
Our current approach for creating new `Expr`s is overly complicated, as outlined here: https://github.com/h2oai/datatable/blob/master/src/core/expr/!readme.md. This complexity stems mostly from the fact that the `Expr` class which performs arithmetic on `f-expressions`...
- [x] Suppose there is one column stores time in UNIX epoch with integer type, provide a function that trans this column to time64, something like `pandas.to_datetime` - [ ]...
Currently when a link in the lhs menu is clicked, that menu is scrolled to the top as the new page loads. This makes the navigation very inconvenient. A better...
- convert data from wide to long, and vice versa (similar to ``melt/dcast`` in `rdatatable` or ``melt/pivot.pd.wide_to_long`` in ``pandas``) # Example : df = dt.Frame({"A":['a','b','c'], "B":[1,3,5],"C":[2,4,6]}) Transform from wide to...
- Instead of a default ``C0``, it would be nice to have some relevant name ## Example: ``` from datatable import dt, f, by grades = [48, 99, 75, 80,...