gokart
gokart copied to clipboard
feat: support polars with auto detection from TaskOnKart's Generic type
Summary
This PR adds support for polars DataFrames alongside pandas DataFrames in gokart's file processors. Ta
sks can now specify their DataFrame type via the TaskOnKart[T] type parameter, and the appropriate pro
cessor will be automatically selected.
Features
- Multi-backend support: File processors now support both pandas and polars DataFrames
CsvFileProcessorJsonFileProcessorParquetFileProcessorFeatherFileProcessor
- Auto-detection: Automatically detects DataFrame type from
TaskOnKart[pd.DataFrame]orTaskOnK art[pl.DataFrame]type parameters - Backward compatible: Defaults to pandas when no type parameter is specified
Usage Example
import polars as pl
from gokart import TaskOnKart
class MyPolarsTask(TaskOnKart[pl.DataFrame]):
def output(self):
return self.make_target('path/to/target.feather')
def run(self):
df = pl.DataFrame({'a': [1, 2, 3]})
self.dump(df) # Automatically uses polars-compatible processor
class MyPandasTask(TaskOnKart[pd.DataFrame]):
def output(self):
return self.make_target('path/to/target.feather')
def run(self):
df = pd.DataFrame({'a': [1, 2, 3]})
self.dump(df) # Uses pandas processor (default behavior)
Why not #457 ?
In #457, we switch between Polars and Pandas using GOKART_DATAFRAME_FRAMEWORK. However, this implies that we cannot use both at the same time. Since projects often migrate from Pandas gradually, we should allow users to use both Polars and Pandas simultaneously.