spark icon indicating copy to clipboard operation
spark copied to clipboard

Apache Spark - A unified analytics engine for large-scale data processing

Results 649 spark issues
Sort by recently updated
recently updated
newest added

### What changes were proposed in this pull request? This change wraps the iterator returned by `SQLExecutionRDD#compute` so that it propagates the SQL conf at the time the iterator is...

SQL

### What changes were proposed in this pull request? Currently, 'HADOOP_CONF_DIR' in ENV and Hadoop configmap cannot be both configured. However, in cloud vendor EMR environments, 'HADOOP_CONF_DIR' is often already...

KUBERNETES

### What changes were proposed in this pull request? This is a follow-up PR that reverts https://github.com/apache/spark/pull/48284 in the first commit and offers a new way to deal with the...

SQL

### What changes were proposed in this pull request? Find a code style issue and fix it. ### Why are the changes needed? Fix code style ### Does this PR...

SQL

### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? ###...

SQL
STRUCTURED STREAMING
BUILD

### What changes were proposed in this pull request? To improve insertion performance, we do not need to add transform expressions when there is no conversion for complex types. ###...

SQL

### What changes were proposed in this pull request? Add the option to `applyInArrow` to take a function that takes an iterator of `RecordBatch` and returns an iterator of `RecordBatch`....

SQL
CORE
PYTHON

### What changes were proposed in this pull request? This PR removes ExperimentalMethod from SQL. This is the first extension point we had for Spark SQL. However it is has...

SQL
STRUCTURED STREAMING
BUILD

### What changes were proposed in this pull request? Add task write data time to SQL tab's graph node. After adding the metric, the following figure is shown. ### Why...

SQL

### What changes were proposed in this pull request? `CollectLimitExec` is used when a logical `Limit` and/or `Offset` operation is the final operator. Comparing to `GlobalLimitExec`, it can avoid shuffle...

SQL