spark
spark copied to clipboard
Apache Spark - A unified analytics engine for large-scale data processing
### What changes were proposed in this pull request? This PR proposes to improve the examples in `pyspark.sql.streaming.readwriter` by making each example self-contained with a brief explanation and a bit...
### What changes were proposed in this pull request? Keep the output attributes of a `Union` node's first child in the `RemoveRedundantAliases` rule to avoid correctness issues. ### Why are...
### What changes were proposed in this pull request? This PR aims to provide a method to lower the timeout. Our solution is to ask master for worker’s heartbeat when...
### What changes were proposed in this pull request? Currently, Spark have many string expressions support binary type, but missing examples of binary. This PR will add examples of binary...
### What changes were proposed in this pull request? Migrate SupportsDelete to use V2 Filter ### Why are the changes needed? this is part of the V2Filter migration work ###...
### What changes were proposed in this pull request? When the try_cast() syntax is equivalent to the non-ansi cast, convert TryCast as Cast without wrapping with "TryEval" expression. ### Why...
### What changes were proposed in this pull request? `ArrayInterscet` miss judge if null contains in right expression's hash set. ``` >>> a = [1, 2, 3] >>> b =...
### What changes were proposed in this pull request? This pr aims upgrade scala-maven-plugin to 4.7.1 ### Why are the changes needed? This version brings some bug fix related to...
### What changes were proposed in this pull request? This PR is a refactor of `SessionCatalog`. It centralizes the code of qualifying identifiers in one place: `SessionCatalog.qualifyIdentifier`. Then we can...
### What changes were proposed in this pull request? This PR addresses the issue raised in https://issues.apache.org/jira/browse/SPARK-39983 - broadcast relations should not be cached on the driver as they are...