spark issues

[SPARK-40487][SQL] Make defaultJoin in BroadcastNestedLoopJoinExec running in parallel

3

### What changes were proposed in this pull request? Currently, the defaultJoin method in BroadcastNestedLoopJoinExec collects notMatchedBroadcastRows firstly, then collects matchedStreamRows. The two steps could run in parallel instead of...

xingchaozh

SQL

[SPARK-40382][SQL] Group distinct aggregate expressions by semantically equivalent children in `RewriteDistinctAggregates`

3

### What changes were proposed in this pull request? In `RewriteDistinctAggregates`, when grouping aggregate expressions by function children, treat children that are semantically equivalent as the same. ### Why are...

bersprockets

SQL

[SPARK-40501][SQL] Enhance 'SpecialLimits' to support project(..., limit(...))

2

### What changes were proposed in this pull request? The pr aim to enhance 'SpecialLimits' to support project(..., limit(...)), for Improve query performance ### Why are the changes needed? When...

panbingkun

SQL

[SPARK-40510][PS] Implement `ddof` in `Series.cov`

### What changes were proposed in this pull request? Implement `ddof` in `Series.cov`, by switch to `SF.covar` ### Why are the changes needed? for API coverage ### Does this PR...

zhengruifeng

CORE

PYTHON

PANDAS API ON SPARK

[SPARK-40360] [WIP] ALREADY_EXISTS and NOT_FOUND exceptions

1

### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested?

srielau

SQL

STRUCTURED STREAMING

CORE

R

[WIP][SPARK-40497][BUILD] Upgrade Scala to 2.13.9

1

### What changes were proposed in this pull request? This pr aims to update from Scala 2.13.8 to Scala 2.13.9 for Apache Spark 3.4. ### Why are the changes needed?...

LuciferYang

SQL

BUILD

[SPARK-40506]Spark Streaming metrics name doesn't need application name

2

### What changes were proposed in this pull request? This PR removes application name in spark streaming metrics name. Spark StreamingSource Metrics sourceName is inappropriate.The label now looks like `application_xxxxx_xxxx_driver_NetworkWordCount_StreamingMetrics_streaming_lastCompletedBatch_processingEndTime...

beryllw

DSTREAM

[SPARK-38717][SQL] Handle Hive's bucket spec case preserving behaviour

6

### What changes were proposed in this pull request? When converting a native table metadata representation `CatalogTable` to `HiveTable` make sure bucket spec uses an existing column. ### Does this...

peter-toth

SQL

[SPARK-33152] [SQL] Improved constraint propagation

1

### What changes were proposed in this pull request? This PR proposes new algorithm to create & store the constraints. It tracks aliases in projection which eliminates the need of...

ahshahid

SQL

INFRA

[SPARK-40504][YARN] Make yarn appmaster load config from client

1

https://issues.apache.org/jira/browse/SPARK-40504 ### What changes were proposed in this pull request? After apply this, AppMaster will load __spark_hadoop_conf__.xml to override the config. It means appmaster will use config from client. ###...

zhengchenyu

YARN

spark
spark copied to clipboard

Metadata

[SPARK-40487][SQL] Make defaultJoin in BroadcastNestedLoopJoinExec running in parallel

[SPARK-40382][SQL] Group distinct aggregate expressions by semantically equivalent children in `RewriteDistinctAggregates`

[SPARK-40501][SQL] Enhance 'SpecialLimits' to support project(..., limit(...))

[SPARK-40510][PS] Implement `ddof` in `Series.cov`

[SPARK-40360] [WIP] ALREADY_EXISTS and NOT_FOUND exceptions

[WIP][SPARK-40497][BUILD] Upgrade Scala to 2.13.9

[SPARK-40506]Spark Streaming metrics name doesn't need application name

[SPARK-38717][SQL] Handle Hive's bucket spec case preserving behaviour

[SPARK-33152] [SQL] Improved constraint propagation

[SPARK-40504][YARN] Make yarn appmaster load config from client

← Metadata

Owner

Metadata

spark spark copied to clipboard

Metadata

← Metadata

Owner

Metadata

spark
spark copied to clipboard