Apache Spark topic

Apache Spark is an open source distributed general-purpose cluster-computing framework. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.

List Apache Spark repositories

delta

6.9k
Stars
1.6k
Forks
208
Watchers

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs

delta-sharing

682
Stars
147
Forks
Watchers

An open protocol for secure data sharing

azure-cosmosdb-spark

5
Stars
1
Forks
Watchers

Apache Spark Connector for Azure Cosmos DB

jspark

7
Stars
1
Forks
Watchers

Simple jdbc client for Apache Spark

coursera-spark-notes

12
Stars
7
Forks
Watchers

Study notes for "Big Data Analysis with Scala and Spark" on Coursera

learning-scala-for-data-science

6
Stars
2
Forks
Watchers

Data Science: Scala for brave and impatient

spark-boilerplate

10
Stars
3
Forks
Watchers

A boilerplate for spark projects with docker support for local development and scripts for emr support.

spark-kinesis-redshift

9
Stars
6
Forks
Watchers

Example project for consuming AWS Kinesis streamming and save data on Amazon Redshift using Apache Spark

spark.fish

332
Stars
6
Forks
Watchers

▁▂▄▆▇█▇▆▄▂▁

distributed-dataset

113
Stars
5
Forks
Watchers

A distributed data processing framework in Haskell.