[Core feature] Add support pyspark.sql.classic.dataframe.DataFrame transformer
Motivation: Why do you think this is important?
pyspark.sql.classic.dataframe.DataFrame is a new type of dataframe; it's not recognized by the current structured dataset encoder/decoder. Therefore, the type transformer will fail to serialize and deserialize it.
https://github.com/flyteorg/flytekit/actions/runs/15334631361/job/43149503509
Goal: What should the final outcome look like, ideally?
Add support pyspark.sql.classic.dataframe.DataFrame structured dataset encoder/decoder
Describe alternatives you've considered
NA
Propose: Link/Inline OR Additional context
NA
Are you sure this issue hasn't been raised already?
- [x] Yes
Have you read the Code of Conduct?
- [x] Yes
Currently, Flyte's structured dataset encoder/decoder for Spark only supports the standard pyspark.sql.dataframe.DataFrame type, not the new pyspark.sql.classic.dataframe.DataFrame. As a result, serialization and deserialization of the classic DataFrame will fail because the type is not registered with the transformer engine or handled by the Spark plugin's encoder/decoder classes. This limitation is confirmed in the current implementation of the Spark SD transformers.
To add support, a new encoder and decoder for pyspark.sql.classic.dataframe.DataFrame must be implemented and registered with the StructuredDatasetTransformerEngine, similar to how it is done for the standard Spark DataFrame. There is no mention in the data sources of existing support or a workaround for this type, so this feature would require a code contribution to Flytekit.
Would you like more detail on how to implement such a transformer?
Sources:
- https://www.union.ai/docs/flyte/api-reference/flytekit-sdk/packages/flytekit.types.structured.structured_dataset/#code-4c14aba93fbe933d19c4a900ff04c5a7cac6730a-14
- https://github.com/flyteorg/flyte/blob/master/docs/user_guide/data_types_and_io/structureddataset.md
This is an AI-generated response and your feedback is appreciated! Please leave a 👍 if this is helpful and 👎 if it is not.
Mention @runllm to ask follow-up questions. See runllm.com for more!
#take