spark
spark copied to clipboard
Apache Spark - A unified analytics engine for large-scale data processing
### What changes were proposed in this pull request? Currently, the defaultJoin method in BroadcastNestedLoopJoinExec collects notMatchedBroadcastRows firstly, then collects matchedStreamRows. The two steps could run in parallel instead of...
### What changes were proposed in this pull request? In `RewriteDistinctAggregates`, when grouping aggregate expressions by function children, treat children that are semantically equivalent as the same. ### Why are...
### What changes were proposed in this pull request? The pr aim to enhance 'SpecialLimits' to support project(..., limit(...)), for Improve query performance ### Why are the changes needed? When...
### What changes were proposed in this pull request? Implement `ddof` in `Series.cov`, by switch to `SF.covar` ### Why are the changes needed? for API coverage ### Does this PR...
### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested?
### What changes were proposed in this pull request? This pr aims to update from Scala 2.13.8 to Scala 2.13.9 for Apache Spark 3.4. ### Why are the changes needed?...
### What changes were proposed in this pull request? This PR removes application name in spark streaming metrics name. Spark StreamingSource Metrics sourceName is inappropriate.The label now looks like `application_xxxxx_xxxx_driver_NetworkWordCount_StreamingMetrics_streaming_lastCompletedBatch_processingEndTime...
### What changes were proposed in this pull request? When converting a native table metadata representation `CatalogTable` to `HiveTable` make sure bucket spec uses an existing column. ### Does this...
### What changes were proposed in this pull request? This PR proposes new algorithm to create & store the constraints. It tracks aliases in projection which eliminates the need of...
https://issues.apache.org/jira/browse/SPARK-40504 ### What changes were proposed in this pull request? After apply this, AppMaster will load __spark_hadoop_conf__.xml to override the config. It means appmaster will use config from client. ###...