etl icon indicating copy to clipboard operation
etl copied to clipboard

:sparkles: enhance `dataset.read_table(...)` method

Open Marigold opened this issue 5 months ago • 0 comments

Historically, we've been using function dataset["my_table"] to access table from a dataset. Recently, a new helper method dataset.read_table(reset_index: bool) has been added that lets us read the table with reset index which is significantly faster for large dimensional datasets.

We could add more functionality to read_table and make it de facto standard to read tables. These could be:

  • Retype all columns to "standard" types (e.g. uint8 -> int64, Float16 -> float64) and categorical to string type
  • underscore column names etc. (see .format method)
  • etc.

Marigold avatar Sep 12 '24 10:09 Marigold