spark
spark copied to clipboard
Apache Spark - A unified analytics engine for large-scale data processing
### What changes were proposed in this pull request? This pr aims to upgrade slf4j from 1.7.36 to 2.0.x, the main change as follows: 1. Upgrade slf4j version from 1.7.36...
### What changes were proposed in this pull request? Implement `GroupBy.prod` ### Why are the changes needed? for API coverage ### Does this PR introduce _any_ user-facing change? yes, the...
### What changes were proposed in this pull request? When running spark application against spark 3.3, I see the following : ``` java.lang.IllegalArgumentException: Unsupported data source V2 partitioning type: CustomPartitioning...
### What changes were proposed in this pull request? This PR changes the behavior of how columns with mixing dates and timestamps are supported in CSV schema inference and data...
### What changes were proposed in this pull request? In the PR, I propose to migrate all parsing errors onto temporary error classes with the prefix `_LEGACY_ERROR_TEMP_`. The error message...
### What changes were proposed in this pull request? This PR is a follow-up PR for #32364. It has been closed by github-actions because it hasn't been updated in a...
### What changes were proposed in this pull request? Spark 3.4 support Python 3.7+ , but python related UTs only check python executable exist, not check the python version. So...
### What changes were proposed in this pull request? This PR adds the test suites for #37893, applyInPandasWithState. The new test suite mostly ports E2E test cases from existing [flatMapGroupsWithState](https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/streaming/FlatMapGroupsWithStateSuite.scala)....
### What changes were proposed in this pull request? This is a draft change of the current state of the Spark Connect prototype implemented as a driver plugin to separate...
### What changes were proposed in this pull request? This PR proposes to introduce the new API `applyInPandasWithState` in PySpark, which provides the functionality to perform arbitrary stateful processing in...