spark
spark copied to clipboard
Apache Spark - A unified analytics engine for large-scale data processing
### What changes were proposed in this pull request? Add more pyspark pandas Index func which is similar with pandas. ### Why are the changes needed? Add where and putmask...
### What changes were proposed in this pull request? Following up on https://issues.apache.org/jira/browse/SPARK-31440, values like `"value" : "total (min, med, max (stageId: taskId))\n177.0 B (59.0 B, 59.0 B, 59.0 B...
### What changes were proposed in this pull request? The changes proposed are to add a clickable field on SQL UI to show timing of spark phases and rule timing...
### What changes were proposed in this pull request? Add explicit stageId to operator mapping in the Spark UI that is a more general version of https://issues.apache.org/jira/browse/SPARK-30209, where a stageId->...
### What changes were proposed in this pull request? When i want to collect the details of SparkHistoryServer, such as GC, memory. But i cant find a way. I need...
### What changes were proposed in this pull request? Use ArrowType.Decimal(precision, scale, bitWidth) instead of ArrowType.Decimal(precision, scale) to cleanup following compilation warnings: > [warn] /home/runner/work/spark/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/util/ArrowUtils.scala:48:49: [deprecation @ org.apache.spark.sql.util.ArrowUtils.toArrowType | origin=org.apache.arrow.vector.types.pojo.ArrowType.Decimal....
### What changes were proposed in this pull request? na filter is added in the read csv option . This is similar to na filter option in pandas data.csv ```...
### What changes were proposed in this pull request? Spark will set maxRows at runtime: https://github.com/apache/spark/blob/bb6f65acca2918a0ceb13b612d210f1b46fa1add/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/LogicalQueryStage.scala#L58 This PR update `SpecialLimits` to use `TakeOrderedAndProject` if maxRows below the `spark.sql.execution.topKSortMaxRowsThreshold`. For example:...
### What changes were proposed in this pull request? This pr improve the rule `PushDownLeftSemiAntiJoin` that forbid push left semi/anti through project by checking: - probably pruned project - complex...
### What changes were proposed in this pull request? This PR used to simplify the description of built-in function. This PR have a lot of simplified cases. **Case one:** ```...