labtools-k8s
labtools-k8s copied to clipboard
Complete data engineering pipeline running on Minikube Kubernetes, Argo CD, Spark, Trino, S3, Delta lake, Postgres+ Debezium CDC, MySQL,Airflow, Kafka Strimzi, Datahub, OpenMetadata,Zeppelin, Jupyter...
This is a work in progress...
-
TODO
- AWS EKS + Terraform
- Staging environment with production-like characteristics
- AWS EKS + Terraform
Pipeline architecture
flowchart TD
Postgres(Postgres Database) -->|CDC| Kafka(Kafka Strimzi)
SQLServer(SQL Server Database) -->|CDC| Kafka
Kafka -->|AVRO Data Stream| ConsumerMinio(Minio S3)
ConsumerMinio -->|AVRO Data Stream| ConsumerSpark(Apache Spark)
ConsumerSpark --> |CDC Replication using Scala Engine - TODO| ConsumerDelta(Delta Lake)
ConsumerSpark --> |Data catalog, lineage| ConsumerDatahub(Datahub)
ConsumerSpark --> HiveMetastore(Hive metastore)
Kafka -->|Schema Management| SchemaRegistry(Confluent Schema Registry)
Kafka --> RedpandaConsole(Redpanda Console)
SchemaRegistry -->|Schema Use - API| ConsumerSpark
ConsumerDelta -->|Data Query| Trino(Trino)
click ConsumerDelta href "https://github.com/rogeriomm/debezium-cdc-replication-delta" "Visit GitHub repository"
Airflow(Apache Airflow) -->|Orchestrate| ConsumerSpark
Trino --> Zeppelin(Zeppelin)
Trino --> Jupyter(Jupyter)
Trino --> Metabase(Metabase)
class Postgres,SQLServer database;
class Kafka,SchemaRegistry kafka;
class ConsumerMinio,ConsumerSpark,ConsumerDelta consumers;
class Datahub datahub;
Kafka Strimzi, Debezium CDC AVRO, Confluent Schema Registry, Postgres/SQL Server
Postgres
- YAML - Notebook
Microsoft SQL Server CDC
- YAML Notebook Notebook CDC
Zeppelin/Jupyter
- YAML
- YAML
Spark
Metabase
Datahub
OpenMetadata
Airflow
Minio
Argo CD
Kubernetes
Web local
| Local URL | Description | User | Password |
|---|---|---|---|
| https://dashboard.worldl.xpt/ | K8S dashboard | ||
| https://argocd.worldl.xpt | ArgoCD | admin | Notebook |
| https://zeppelin.worldl.xpt | Zeppelin | ||
| https://jupyter.worldl.xpt/jupyter | Jupyter notebook: Python,Scala, RUST | ||
| https://jupyter-commander.worldl.xpt/jupyter | Jupyter notebook: Python,Scala, RUST - K8S Admin Service Account | ||
| https://minio-console.worldl.xpt | MINIO operator instance minio-tenant-1 | minio | awesomes3 |
| https://console.minio-operator.svc.cluster2.xpt:9090 | MINIO operator | ||
| https://airflow.worldl.xpt/flower/ | Airflow flower | admin | admin |
| https://airflow.worldl.xpt/airflow | Airflow | ||
| https://jupyter-glue2.worldl.xpt/ | AWS Glue version 2.0 - Jupyter | ||
| https://webui-glue2.worldl.xpt/ | AWS Glue version 2.0 - WebUI | ||
| https://history-glue2.worldl.xpt/ | AWS Glue version 2.0 - History | ||
| https://jupyter-glue3.worldl.xpt/ | AWS Glue version 3.0 - Jupyter | ||
| https://webui-glue3.worldl.xpt/ | AWS Glue version 3.0 - WebUI | ||
| https://history-glue3.worldl.xpt/ | AWS Glue version 3.0 - History | ||
| https://jupyter-glue4.worldl.xpt/ | AWS Glue version 4.0 - Jupyter | ||
| https://webui-glue4.worldl.xpt/ | AWS Glue version 4.0 - WebUI | ||
| https://history-glue4.worldl.xpt/ | AWS Glue version 4.0 - History | ||
| http://datahub.worldl.xpt/ | Datahub | datahub | manualPassword |
| https://openmetadata.worldl.xpt/ | OpenMetadata | admin | admin |
| https://kafkaui.worldl.xpt/ | Kafka UI | ||
| https://redpanda-console.worldl.xpt/ | Redpanda Console | ||
| https://metabase.worldl.xpt/ | Metabase | ||
| http://trino.trino.svc:8080 | Trino | ||
| https://jfrog.worldl.xpt | Jfrog | admin | password |
| https://harbor.worldl.xpt | Harbor | admin | notebook |
| https://nexus.worldl.xpt/ | Nexus Free trial | admin | admin123 |
| https://nexus.admin.worldl.xpt/ | Nexus Free trial | ||
| https://keycloack.worldl.xpt | Keycloak | user | notebook |
Internet Web (Protected by Firewall)
| Public URL | Description | |||
|---|---|---|---|---|
| https://world-zeppelin.duckdns.org | Zeppelin | |||
| https://world-jupyter.duckdns.org/jupyter | Jupyter notebook: Python, Scala, RUST |