gokart icon indicating copy to clipboard operation
gokart copied to clipboard

feat: support polars with auto detection from TaskOnKart's Generic type

Open kitagry opened this issue 3 weeks ago • 0 comments

Summary

This PR adds support for polars DataFrames alongside pandas DataFrames in gokart's file processors. Ta sks can now specify their DataFrame type via the TaskOnKart[T] type parameter, and the appropriate pro cessor will be automatically selected.

Features

  • Multi-backend support: File processors now support both pandas and polars DataFrames
    • CsvFileProcessor
    • JsonFileProcessor
    • ParquetFileProcessor
    • FeatherFileProcessor
  • Auto-detection: Automatically detects DataFrame type from TaskOnKart[pd.DataFrame] or TaskOnK art[pl.DataFrame] type parameters
  • Backward compatible: Defaults to pandas when no type parameter is specified

Usage Example

import polars as pl
from gokart import TaskOnKart

class MyPolarsTask(TaskOnKart[pl.DataFrame]):
    def output(self):
        return self.make_target('path/to/target.feather')

    def run(self):
        df = pl.DataFrame({'a': [1, 2, 3]})
        self.dump(df)  # Automatically uses polars-compatible processor

class MyPandasTask(TaskOnKart[pd.DataFrame]):
    def output(self):
        return self.make_target('path/to/target.feather')

    def run(self):
        df = pd.DataFrame({'a': [1, 2, 3]})
        self.dump(df)  # Uses pandas processor (default behavior)

Why not #457 ?

In #457, we switch between Polars and Pandas using GOKART_DATAFRAME_FRAMEWORK. However, this implies that we cannot use both at the same time. Since projects often migrate from Pandas gradually, we should allow users to use both Polars and Pandas simultaneously.

kitagry avatar Dec 14 '25 15:12 kitagry