spark
spark copied to clipboard
Apache Spark - A unified analytics engine for large-scale data processing
### What changes were proposed in this pull request? Documentation provided for StructType.fromJson method ### Why are the changes needed? To make it easy for a user to understand this...
### What changes were proposed in this pull request? This PR updates schema inference in DSv1 FileFormat to remove overlapping columns from the data schema and keep them in the...
### What changes were proposed in this pull request? This PR aims to support number-only column names in ORC data sources when orc impl is hive. In the current master,...
Adding examples and parameters description to union and unionAll ### What changes were proposed in this pull request? Documentation enhancements ### Why are the changes needed? Help user to understand...
### What changes were proposed in this pull request? Currently stage level scheduling works for yarn/k8s/standalone cluster when dynamic allocation is enabled, and spark app will acquire executors with different...
### What changes were proposed in this pull request? This adds the following push based shuffle metrics like : - Merger count and magnet enabled/disabled for stage - Time spent...
### Why are the changes needed? This PR aims to fix the case ```scala sql("create table t1(a decimal(3, 0)) using parquet") sql("insert into t1 values(100), (10), (1)") sql("select * from...
### What changes were proposed in this pull request? This PR will place spark.files , spark.jars and spark.pyfiles to the current working directory on the driver in K8s cluster mode...
### What changes were proposed in this pull request? - Move parser tests from DDLParserSuite to AlterTableSetLocationParserSuite. - Port DS v1 tests from DDLSuite and other test suites to v1.AlterTableSetLocationSuite....
### What changes were proposed in this pull request? Added randomization in spark local directories ### Why are the changes needed? In case of K8 , each executor gets same...