spark icon indicating copy to clipboard operation
spark copied to clipboard

Apache Spark - A unified analytics engine for large-scale data processing

Results 649 spark issues
Sort by recently updated
recently updated
newest added

### What changes were proposed in this pull request? Currently `AliasAwareOutputPartitioning` takes only the last alias by aliased expressions into account. We could avoid more shuffles with better alias handling....

SQL

### What changes were proposed in this pull request? This PR improves the foldable expression statistics estimation by providing more accurate min, max, and data length for string and binary...

SQL

### What changes were proposed in this pull request? Support more than Integer.MAX_VALUE of the same join key. ### Why are the changes needed? For SMJ, the number of the...

SQL
CORE
PYTHON

### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested?

SQL
BUILD
INFRA

### What changes were proposed in this pull request? This PR makes Bloom filter join use larger number of bits to build Bloom filter if row count is exist. ###...

SQL

### What changes were proposed in this pull request? Atfter RemoveRedundantAggregates rule, we should pull the complex group by expression out. ### Why are the changes needed? This will fix...

SQL

The modification points are: Spark SQL writing MySQL supports update Background and purpose In the current big data scenario, when writing to the MySQL relational database, the redundancy of data...

SQL

…es blocking in local model ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How...

CORE

### What changes were proposed in this pull request? Adds Scala and Python bindings for SQL functions inline and inline_outer ### Why are the changes needed? Currently these functions can...

SQL
CORE
PYTHON

### What changes were proposed in this pull request? After https://github.com/apache/spark/pull/32298 we were able to merge scalar subquery plans, but DSv2 sources couldn't benefit from that improvement. The reason for...

SQL