Data-Platform-Engineering icon indicating copy to clipboard operation
Data-Platform-Engineering copied to clipboard

🌥️ The GitOps Platform for Data Analytics utilizes Kubernetes (K8s) and Terraform IaC on the AWS Cloud, offering speed, scalability, agility, and cost efficiency. ⚡

The GitOps Platform for Data Analytics on Kubernetes 🚀

🎯 The GitOps Platform for Data Analytics utilizes Kubernetes (K8s) and HashiCorp's Terraform Infrastructure as Code (IaC) on the AWS Cloud 🌥️, offering speed, scalability, agility, and cost efficiency. ⚡

Build, Scale, and Optimize Data & AI/ML Platforms on K8s

🏗️ Architecture

The diagram below showcases the wide array of open-source data tools, Kubernetes operators, and frameworks supported by DoK8s. It also highlights the seamless integration of Data Analytics managed services with the powerful capabilities of DoK8s open-source tools: reusable, composable, configurable.

image

🌟 Features

Data on K8s (DoK8s) solution is categorized into the following focus areas.

  • 🎯 Data Analytics on K8s
  • 🎯 AI/ML on K8s
  • 🎯 Streaming Platforms on K8s
  • 🎯 Scheduler Workflow Platforms on K8s
  • 🎯 Distributed Databases & Query Engine on K8s

🏃‍♀️ Deliverables

  • [x] 🚀 Reproducible Local Development with Dev Containers: VSCode, K8s, TF, Python/R
  • [ ] 🚀 JupyterHub on EKS 👈 This blueprint deploys a self-managed JupyterHub on EKS with Amazon Cognito authentication.
  • [ ] 🚀 Spark Operator with Apache YuniKorn on EKS 👈 This blueprint deploys EKS cluster and uses Spark Operator and Apache YuniKorn for running self-managed Spark jobs
  • [ ] 🚀 Self-managed Airflow on EKS 👈 This blueprint sets up a self-managed Apache Airflow on an Amazon EKS cluster, following best practices.
  • [ ] 🚀 Argo Workflows on EKS 👈 This blueprint sets up a self-managed Argo Workflow on an Amazon EKS cluster, following best practices.
  • [ ] 🚀 Kafka on EKS 👈 This blueprint deploys a self-managed Kafka on EKS using the popular Strimzi Kafka operator.

Built with ❤️ at AWS 🌥️ K8s 🌟 Terraform 🚀.