spark
spark copied to clipboard
Apache Spark - A unified analytics engine for large-scale data processing
### What changes were proposed in this pull request? This change wraps the iterator returned by `SQLExecutionRDD#compute` so that it propagates the SQL conf at the time the iterator is...
### What changes were proposed in this pull request? Currently, 'HADOOP_CONF_DIR' in ENV and Hadoop configmap cannot be both configured. However, in cloud vendor EMR environments, 'HADOOP_CONF_DIR' is often already...
### What changes were proposed in this pull request? This is a follow-up PR that reverts https://github.com/apache/spark/pull/48284 in the first commit and offers a new way to deal with the...
### What changes were proposed in this pull request? Find a code style issue and fix it. ### Why are the changes needed? Fix code style ### Does this PR...
### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? ###...
### What changes were proposed in this pull request? To improve insertion performance, we do not need to add transform expressions when there is no conversion for complex types. ###...
### What changes were proposed in this pull request? Add the option to `applyInArrow` to take a function that takes an iterator of `RecordBatch` and returns an iterator of `RecordBatch`....
### What changes were proposed in this pull request? This PR removes ExperimentalMethod from SQL. This is the first extension point we had for Spark SQL. However it is has...
### What changes were proposed in this pull request? Add task write data time to SQL tab's graph node. After adding the metric, the following figure is shown. ### Why...
### What changes were proposed in this pull request? `CollectLimitExec` is used when a logical `Limit` and/or `Offset` operation is the final operator. Comparing to `GlobalLimitExec`, it can avoid shuffle...