spark icon indicating copy to clipboard operation
spark copied to clipboard

Apache Spark - A unified analytics engine for large-scale data processing

Results 649 spark issues
Sort by recently updated
recently updated
newest added

### What changes were proposed in this pull request? This PR improves join stats estimation if one side can keep uniqueness. A common case is: ```sql SELECT i_item_sk ss_item_sk FROM...

SQL

### What changes were proposed in this pull request? This PR enable `spark.sql.cbo.enabled` by default. ### Why are the changes needed? 1. Enable CBO to get better performance, we've enabled...

SQL

### What changes were proposed in this pull request? Print the application id after getting application from yarn. ### Why are the changes needed? it's useful for users to find...

YARN

### What changes were proposed in this pull request? We should propagate the row count stats in SizeInBytesOnlyStatsPlanVisitor if available. Row counts are propagated from connectors to spark in case...

SQL

### What changes were proposed in this pull request? I propose this PR to warn the user when he is not using correctly the spark submit CLI. The idea here...

### What changes were proposed in this pull request? This PR fixes the issue SPARK-39838 where an explicitly set empty Column Metadata object would be optimized away, making it impossible...

SQL

Currently, as doc description of pyspark pandas and pandas itself, these 2 options are only support 1-char input, so let's make pyspark pandas follow this behavior. Issue affected: https://issues.apache.org/jira/browse/SPARK-39654 ###...

CORE
PYTHON
PANDAS API ON SPARK

### What changes were proposed in this pull request? As per the CustomAuthentication(hive.server2.transport.mode=HTTP) in Thrift Server only String username and String password is supported in the authenticate method of PasswdAuthenticationProvider(https://github.com/apache/spark/blob/master/sql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/PasswdAuthenticationProvider.java#L37)....

SQL

### What changes were proposed in this pull request? This PR adds codegen support to array based higher order functions except ArraySort. This is my first time playing around with...

SQL

When creating classes that extend multiple `Expression` traits, they have an inheritance conflict due to `nodePatterns` being different values. `nodePatterns` is also final, which makes the inheritance conflict unavoidable. This...

SQL