Apache Spark topic
Apache Spark is an open source distributed general-purpose cluster-computing framework. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.
delta
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
delta-sharing
An open protocol for secure data sharing
azure-cosmosdb-spark
Apache Spark Connector for Azure Cosmos DB
coursera-spark-notes
Study notes for "Big Data Analysis with Scala and Spark" on Coursera
learning-scala-for-data-science
Data Science: Scala for brave and impatient
spark-boilerplate
A boilerplate for spark projects with docker support for local development and scripts for emr support.
spark-kinesis-redshift
Example project for consuming AWS Kinesis streamming and save data on Amazon Redshift using Apache Spark
distributed-dataset
A distributed data processing framework in Haskell.