spark icon indicating copy to clipboard operation
spark copied to clipboard

Apache Spark - A unified analytics engine for large-scale data processing

Results 649 spark issues
Sort by recently updated
recently updated
newest added

### What changes were proposed in this pull request? Given a batch size to `applyInPandas`, multiple groups are sent to Python UDF at once if they are very small. This...

SQL
CORE
PYTHON

### What changes were proposed in this pull request? Add new feature, make distribution and ordering support V2 function in writing. Currently, the rule `V2Writes` support converting `ApplyTransform` to `TransformExpression`...

SQL

### What changes were proposed in this pull request? Expose custom metrics available on driver from DS v2 data sources on SQL UI. ### Why are the changes needed? https://github.com/apache/spark/commit/115ed89a3cd75faea3e6e29fb580da45309c0f31...

SQL
WEB UI

### Why are the changes needed? Refactored exceptions thrown in the package `org.apache.spark.security` in to be in inline Spark error class framework ### Does this PR introduce _any_ user-facing change?...

CORE

### What changes were proposed in this pull request? - Pull out the pattern of `TakeOrderedAndProjectExec` to `ExtractTopK` - Add a new rule `PushLocalTopKThroughOuterJoin` which matches the `ExtractTopK` pattern -...

SQL

### What changes were proposed in this pull request? This change adds `row_index` column to `_metadata` struct. This column allows us to uniquely identify rows read from a given file...

SQL
STRUCTURED STREAMING

### What changes were proposed in this pull request? `ExternalShuffleBlockResolver`, `YarnShuffleService` and `RemoteBlockPushResolver` use `LevelDB` directly, this is not conducive to extending the use of `RocksDB` in this scenario. This...

MESOS
YARN
CORE

### What changes were proposed in this pull request? Fixed memory leak caused by not clearing the FileStatusCache created by calling FileStatusCache.getOrCreate when close a SparkSession of Spark Thrift Server....

SQL

### What changes were proposed in this pull request? From this pr:https://github.com/apache/spark/pull/22112, we learn that currently we can't rollback and rerun a result stage, and just fail. And this new...

SQL
CORE

### What changes were proposed in this pull request? Pyspark dataframe drop has following signature: `def drop(self, *cols: "ColumnOrName") -> "DataFrame":` However when we try to pass multiple Column types...

SQL
CORE
PYTHON
R