dataproc topic

List dataproc repositories

learning-hadoop-and-spark

172
Stars
152
Forks
Watchers

Companion to Learning Hadoop and Learning Spark courses on Linked In Learning

spydra

133
Stars
34
Forks
Watchers

Ephemeral Hadoop clusters using Google Compute Platform

bigflow

115
Stars
23
Forks
Watchers

A Python framework for data processing on GCP.

gomrjob

42
Stars
4
Forks
Watchers

gomrjob - a Go Framework for Hadoop Map Reduce Jobs

debussy_concert

28
Stars
4
Forks
Watchers

Debussy is an opinionated Data Architecture and Engineering framework, enabling data analysts and engineers to build better platforms and pipelines.

etlflow

43
Stars
12
Forks
Watchers

EtlFlow is an ecosystem of functional libraries in Scala based on ZIO for running complex Auditable workflows which can interact with Google Cloud Platform, AWS, Kubernetes, Databases, SFTP servers, O...

serverless-spark-workshop

61
Stars
33
Forks
Watchers

Solution Accelerators for Serverless Spark on GCP, the industry's first auto-scaling and serverless Spark as a service

pyDag

25
Stars
3
Forks
Watchers

Scheduling Big Data Workloads and Data Pipelines in the Cloud with pyDag

spark

129
Stars
10
Forks
Watchers

Performance Observability for Apache Spark

ghcn-d

24
Stars
6
Forks
Watchers

Data Pipeline from the Global Historical Climatology Network DataSet