cloud-dataproc icon indicating copy to clipboard operation
cloud-dataproc copied to clipboard

add spark-scala-quickstart

Open NiloFreitas opened this issue 2 years ago • 4 comments

Dataproc - Spark Scala Quickstart is an effort to assist in the creation of Spark jobs written in Scala to run on Dataproc. It provides different pre-implemented Spark jobs and technical guides to run them on GCP. It is all based on the on the WordCount ETL example with common sources and sinks (Kafka, GCS, BigQuery, etc). It demonstrates how to run Spark jobs using Dataproc Submit, Serverless, Workflow and how to orchestrate them with Cloud Composer.

NiloFreitas avatar Mar 21 '22 17:03 NiloFreitas

#149

NiloFreitas avatar Mar 21 '22 19:03 NiloFreitas

Hi @NiloFreitas thanks for the quickstart. It seems that many other files were added by mistake, among them existing notebooks, codelabs, etc. Can you please verify that only the relevant files are in the PR?

davidrabinowitz avatar Mar 21 '22 20:03 davidrabinowitz

Hi @davidrabinowitz . What files do you mean? I could not find what you referring to. My pull request contains 1 commit of the quickstart guide. It is composed of several scala and python code, but not notebooks and codelabs. All code in the PR was written by me.

NiloFreitas avatar Mar 21 '22 20:03 NiloFreitas

This code can helps a lot! ;-)

dedeco avatar Apr 12 '22 20:04 dedeco