spark
spark copied to clipboard
Apache Spark - A unified analytics engine for large-scale data processing
### What changes were proposed in this pull request? Currently `AliasAwareOutputPartitioning` takes only the last alias by aliased expressions into account. We could avoid more shuffles with better alias handling....
### What changes were proposed in this pull request? This PR improves the foldable expression statistics estimation by providing more accurate min, max, and data length for string and binary...
### What changes were proposed in this pull request? Support more than Integer.MAX_VALUE of the same join key. ### Why are the changes needed? For SMJ, the number of the...
### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested?
### What changes were proposed in this pull request? This PR makes Bloom filter join use larger number of bits to build Bloom filter if row count is exist. ###...
### What changes were proposed in this pull request? Atfter RemoveRedundantAggregates rule, we should pull the complex group by expression out. ### Why are the changes needed? This will fix...
The modification points are: Spark SQL writing MySQL supports update Background and purpose In the current big data scenario, when writing to the MySQL relational database, the redundancy of data...
…es blocking in local model ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How...
### What changes were proposed in this pull request? Adds Scala and Python bindings for SQL functions inline and inline_outer ### Why are the changes needed? Currently these functions can...
### What changes were proposed in this pull request? After https://github.com/apache/spark/pull/32298 we were able to merge scalar subquery plans, but DSv2 sources couldn't benefit from that improvement. The reason for...