etl
etl copied to clipboard
:sparkles: enhance `dataset.read_table(...)` method
Historically, we've been using function dataset["my_table"]
to access table from a dataset. Recently, a new helper method dataset.read_table(reset_index: bool)
has been added that lets us read the table with reset index which is significantly faster for large dimensional datasets.
We could add more functionality to read_table
and make it de facto standard to read tables. These could be:
- Retype all columns to "standard" types (e.g.
uint8 -> int64
,Float16 -> float64
) and categorical tostring
type - underscore column names etc. (see
.format
method) - etc.