dataproc topic
learning-hadoop-and-spark
Companion to Learning Hadoop and Learning Spark courses on Linked In Learning
spydra
Ephemeral Hadoop clusters using Google Compute Platform
bigflow
A Python framework for data processing on GCP.
gomrjob
gomrjob - a Go Framework for Hadoop Map Reduce Jobs
debussy_concert
Debussy is an opinionated Data Architecture and Engineering framework, enabling data analysts and engineers to build better platforms and pipelines.
etlflow
EtlFlow is an ecosystem of functional libraries in Scala based on ZIO for running complex Auditable workflows which can interact with Google Cloud Platform, AWS, Kubernetes, Databases, SFTP servers, O...
serverless-spark-workshop
Solution Accelerators for Serverless Spark on GCP, the industry's first auto-scaling and serverless Spark as a service
pyDag
Scheduling Big Data Workloads and Data Pipelines in the Cloud with pyDag
spark
Performance Observability for Apache Spark
ghcn-d
Data Pipeline from the Global Historical Climatology Network DataSet