spark icon indicating copy to clipboard operation
spark copied to clipboard

Apache Spark - A unified analytics engine for large-scale data processing

Results 649 spark issues
Sort by recently updated
recently updated
newest added

### What changes were proposed in this pull request? This PR implemented a ThrottledLogger, a logger with RateLimiters, to prevent log message flooding caused by network issues. In our ThrottledLogger,...

CORE

### What changes were proposed in this pull request? In the https://issues.apache.org/jira/browse/SPARK-33933, it materializes BroadcastQueryStage firstly to try to avoid broadcast timeout in AQE, but the BroadcastQueryStage does not timeout...

SQL

### What changes were proposed in this pull request? Solve the data skew on the stream side in `BroadcastHashJoin` - When data skew needs to introduce additional shuffle, support forcibly...

SQL

The input parameter of nsmallest should be validated as Integer. So I think we might miss this validation. And PySpark will raise Error when we input the strange types into...

CORE
PYTHON
PANDAS API ON SPARK

### What changes were proposed in this pull request? Support either literal Python strings or Column objects for the pattern and replacement arguments for `regexp_replace`. ### Why are the changes...

SQL
CORE
PYTHON

### What changes were proposed in this pull request? When onDisconnected is triggered, (1) Delay `RemoveExecutor` for 5 seconds to enable driver receives ExecutorExitCode from slow path (2) Prevent task...

CORE

### What changes were proposed in this pull request? Remove unnecessary guava exclusion from jackson-module-scala ### Why are the changes needed? The exclusion added in SPARK-6149, the recent versions of...

BUILD

### What changes were proposed in this pull request? Update the Avro version to 1.11.1 ### Why are the changes needed? To stay up to date with upstream ### Does...

SQL
BUILD
DOCS

### What changes were proposed in this pull request? * Fixed memory leak caused `RawStore` cleanup mechanism not to take effect due to different `threadLocalMS` instances being manipulated. * Fixed...

SQL

### What changes were proposed in this pull request? 1. Add a new optimizer rule(PushPartialAggregationThroughJoin) to push the partial aggregation through join. It supports the following cases: - Push down...

SQL