spark
spark copied to clipboard
[SPARK-50017] Support Avro encoding for TransformWithState operator - ValueState
What changes were proposed in this pull request?
Currently, we use UnsafeRow to store all state in the StateStores. This change will add StateStore APIs that support byte array operations on the StateStore, and to enable Avro encoding to be used for the TransformWithState operator via SQLConf, only for ValueState. Subsequent state variable types will be supported in future PRs.
Why are the changes needed?
UnsafeRow is an inherently unstable format that makes no guarantees of being backwards-compatible. Therefore, if the format changes between Spark releases, this could cause StateStore corruptions. Avro is more stable, and inherently enables schema evolution.
Does this PR introduce any user-facing change?
No
How was this patch tested?
Amended and added to unit tests
Was this patch authored or co-authored using generative AI tooling?
No