flyte icon indicating copy to clipboard operation
flyte copied to clipboard

[Core feature] Add support pyspark.sql.classic.dataframe.DataFrame transformer

Open pingsutw opened this issue 6 months ago • 2 comments

Motivation: Why do you think this is important?

pyspark.sql.classic.dataframe.DataFrame is a new type of dataframe; it's not recognized by the current structured dataset encoder/decoder. Therefore, the type transformer will fail to serialize and deserialize it.

https://github.com/flyteorg/flytekit/actions/runs/15334631361/job/43149503509

Goal: What should the final outcome look like, ideally?

Add support pyspark.sql.classic.dataframe.DataFrame structured dataset encoder/decoder

Describe alternatives you've considered

NA

Propose: Link/Inline OR Additional context

NA

Are you sure this issue hasn't been raised already?

  • [x] Yes

Have you read the Code of Conduct?

  • [x] Yes

pingsutw avatar May 30 '25 20:05 pingsutw

Currently, Flyte's structured dataset encoder/decoder for Spark only supports the standard pyspark.sql.dataframe.DataFrame type, not the new pyspark.sql.classic.dataframe.DataFrame. As a result, serialization and deserialization of the classic DataFrame will fail because the type is not registered with the transformer engine or handled by the Spark plugin's encoder/decoder classes. This limitation is confirmed in the current implementation of the Spark SD transformers.

To add support, a new encoder and decoder for pyspark.sql.classic.dataframe.DataFrame must be implemented and registered with the StructuredDatasetTransformerEngine, similar to how it is done for the standard Spark DataFrame. There is no mention in the data sources of existing support or a workaround for this type, so this feature would require a code contribution to Flytekit.

Would you like more detail on how to implement such a transformer?

Sources:


This is an AI-generated response and your feedback is appreciated! Please leave a 👍 if this is helpful and 👎 if it is not.

Mention @runllm to ask follow-up questions. See runllm.com for more!

runllm[bot] avatar May 30 '25 20:05 runllm[bot]

#take

arbaobao avatar May 31 '25 09:05 arbaobao