spark icon indicating copy to clipboard operation
spark copied to clipboard

Apache Spark - A unified analytics engine for large-scale data processing

Results 649 spark issues
Sort by recently updated
recently updated
newest added

### What changes were proposed in this pull request? This PR adds a test that compares the available list of DataFrame functions in org.apache.spark.sql.functions with the SQL function registry. This...

SQL

### What changes were proposed in this pull request? Add `forceUseStagingDir` config to force use of staging dir when writing. When setting `forceUseStagingDir` to true, I set `committerOutputPath` to staging...

SQL
CORE

### What changes were proposed in this pull request? Original pr: https://github.com/apache/spark/pull/35963 When reading the hive sequence table, you can switch whether to skip corrupt records that fail to be...

SQL

We need to follow the pandas behavior of prefix/suffix parameter validation in add_prefix/add_suffix. Now, we force to validate it as a String type. But pandas looks all values which can...

CORE
PYTHON
PANDAS API ON SPARK

window and min_periods parameters is not be validated in rolling function. ### What changes were proposed in this pull request? Validate the said 2 parameters to be a integer only...

CORE
PYTHON
PANDAS API ON SPARK

### What changes were proposed in this pull request? This ticket intends to add query hints for cache behaviors, users can perform actions like the use/skip/cache/uncache, etc on the cached...

SQL

### What changes were proposed in this pull request? Adding a new `array_sort` overload to `org.apache.spark.sql.functions` that matches the new overload defined in [SPARK-29020](https://issues.apache.org/jira/browse/SPARK-29020) and added via #25728. ### Why...

SQL

Pandas disallow conversion between datetime/timedelta and conversions for any datetimelike to float. This will raise error in PYSPARK, during we simply call a DatetimeIndex. So we need to avoid to...

SQL
CORE
PYTHON
PANDAS API ON SPARK

Provide a graceful error msg to users when they build Index with different dtypes. ### What changes were proposed in this pull request? Raise a graceful error when users create...

CORE
PYTHON
PANDAS API ON SPARK

### What changes were proposed in this pull request? This PR adds the Python version of `Dataset.groupByKey(...).flatMapGroupsWithState(...)` that is `DataFrame.groupby(...).applyInPandasWithState(...)` in PySpark. TBD Note that documentation will be done in...

SQL
STRUCTURED STREAMING
BUILD
CORE
PYTHON