Spark-The-Definitive-Guide icon indicating copy to clipboard operation
Spark-The-Definitive-Guide copied to clipboard

Spark: The Definitive Guide's Code Repository

Results 32 Spark-The-Definitive-Guide issues
Sort by recently updated
recently updated
newest added

Hello, Getting while importing seasonal decompose-from statsmodels.tsa.seasonal import seasonal_decompose List of libraries installed on the databricks: "azure-storage-blob": 12.9.0 "azure-identity": 1.7.1 "azure-keyvault": 4.2.0 "pandas": 1.3.4 "numpy": 1.20.3 "geopandas": 0.9.0 "shapely": 1.7.1...

…alytics_and_Machine_Learning.py fix print() and add some missing code lines

At least for spark-shell the Spark syntax needs this fixing. Without the parenthesis, the following command on page 421 of the book fails to generate proper Array[...] into the params...

This has been completed using the tool: 2to3 Note, "Structured_APIs-Chapter_6_Working_with_Different_Types_of_Data.py" could not be updated due to parsing error: Error: RefactoringTool: There was 1 error: RefactoringTool: Can't parse Structured_APIs-Chapter_6_Working_with_Different_Types_of_Data.py: ParseError: bad...

HI, I have downloaded repository and I was able to execute and practice all example . But when I am trying to execute examples related to SQL data source from...

When executing the below command to write to the below path facing the issue val newPath = "jdbc:sqlite://tmp/my-sqlite.db" val tablename = "flight_info" val props = new java.util.Properties props.setProperty("driver", "org.sqlite.JDBC") csvFile.write.mode("overwrite").jdbc(newPath,...

The below piece of code when run on a spark 3 cluster # in python colName = "count" upperBound = 348113L numPartitions = 10 lowerBound = 0L _fails with File...

Query in the book: INSERT INTO partitioned_flights PARTITION (DEST_COUNTRY_NAME="UNITED STATES") SELECT count, ORIGIN_COUNTRY_NAME FROM flights WHERE DEST_COUNTRY_NAME='UNITED STATES' LIMIT 12 In Spark 3.0, The above query returns the below error...

https://github.com/databricks/Spark-The-Definitive-Guide/blob/38e881406cd424991a624dddb7e68718747b626b/code/Structured_APIs-Chapter_7_Aggregations.scala#L171 First, the column `Quantity` is parsed as `String`. It should be `IntegerType`. Second `orderBy(CustomerId)` should be `orderBy(desc(Quantity))`

Please find the below correction in the book under the Grouping section In the book: --in SQL %sql SELECT count(*) FROM dfTable GROUP BY InvoiceNo, CustomerId +---------+----------+-----+ |InvoiceNo|CustomerId|count| +---------+----------+-----+ |...