cloud-dataproc
cloud-dataproc copied to clipboard
add spark-scala-quickstart
Dataproc - Spark Scala Quickstart is an effort to assist in the creation of Spark jobs written in Scala to run on Dataproc. It provides different pre-implemented Spark jobs and technical guides to run them on GCP. It is all based on the on the WordCount ETL example with common sources and sinks (Kafka, GCS, BigQuery, etc). It demonstrates how to run Spark jobs using Dataproc Submit, Serverless, Workflow and how to orchestrate them with Cloud Composer.
#149
Hi @NiloFreitas thanks for the quickstart. It seems that many other files were added by mistake, among them existing notebooks, codelabs, etc. Can you please verify that only the relevant files are in the PR?
Hi @davidrabinowitz . What files do you mean? I could not find what you referring to. My pull request contains 1 commit of the quickstart guide. It is composed of several scala and python code, but not notebooks and codelabs. All code in the PR was written by me.
This code can helps a lot! ;-)