spark
spark copied to clipboard
Apache Spark - A unified analytics engine for large-scale data processing
### What changes were proposed in this pull request? This PR implemented a ThrottledLogger, a logger with RateLimiters, to prevent log message flooding caused by network issues. In our ThrottledLogger,...
### What changes were proposed in this pull request? In the https://issues.apache.org/jira/browse/SPARK-33933, it materializes BroadcastQueryStage firstly to try to avoid broadcast timeout in AQE, but the BroadcastQueryStage does not timeout...
### What changes were proposed in this pull request? Solve the data skew on the stream side in `BroadcastHashJoin` - When data skew needs to introduce additional shuffle, support forcibly...
The input parameter of nsmallest should be validated as Integer. So I think we might miss this validation. And PySpark will raise Error when we input the strange types into...
### What changes were proposed in this pull request? Support either literal Python strings or Column objects for the pattern and replacement arguments for `regexp_replace`. ### Why are the changes...
### What changes were proposed in this pull request? When onDisconnected is triggered, (1) Delay `RemoveExecutor` for 5 seconds to enable driver receives ExecutorExitCode from slow path (2) Prevent task...
### What changes were proposed in this pull request? Remove unnecessary guava exclusion from jackson-module-scala ### Why are the changes needed? The exclusion added in SPARK-6149, the recent versions of...
### What changes were proposed in this pull request? Update the Avro version to 1.11.1 ### Why are the changes needed? To stay up to date with upstream ### Does...
### What changes were proposed in this pull request? * Fixed memory leak caused `RawStore` cleanup mechanism not to take effect due to different `threadLocalMS` instances being manipulated. * Fixed...
### What changes were proposed in this pull request? 1. Add a new optimizer rule(PushPartialAggregationThroughJoin) to push the partial aggregation through join. It supports the following cases: - Push down...