labtools-k8s
labtools-k8s copied to clipboard
Complete data engineering pipeline running on Minikube Kubernetes, Argo CD, Spark, Trino, S3, Delta lake, Postgres+ Debezium CDC, MySQL,Airflow, Kafka Strimzi, Datahub, OpenMetadata,Zeppelin, Jupyter...
This is a work in progress...
-
TODO
-
AWS EKS + Terraform
- Staging environment with production-like characteristics
-
AWS EKS + Terraform
Pipeline architecture
flowchart TD
Postgres(Postgres Database) -->|CDC| Kafka(Kafka Strimzi)
SQLServer(SQL Server Database) -->|CDC| Kafka
Kafka -->|AVRO Data Stream| ConsumerMinio(Minio S3)
ConsumerMinio -->|AVRO Data Stream| ConsumerSpark(Apache Spark)
ConsumerSpark --> |CDC Replication using Scala Engine - TODO| ConsumerDelta(Delta Lake)
ConsumerSpark --> |Data catalog, lineage| ConsumerDatahub(Datahub)
ConsumerSpark --> HiveMetastore(Hive metastore)
Kafka -->|Schema Management| SchemaRegistry(Confluent Schema Registry)
Kafka --> RedpandaConsole(Redpanda Console)
SchemaRegistry -->|Schema Use - API| ConsumerSpark
ConsumerDelta -->|Data Query| Trino(Trino)
click ConsumerDelta href "https://github.com/rogeriomm/debezium-cdc-replication-delta" "Visit GitHub repository"
Airflow(Apache Airflow) -->|Orchestrate| ConsumerSpark
Trino --> Zeppelin(Zeppelin)
Trino --> Jupyter(Jupyter)
Trino --> Metabase(Metabase)
class Postgres,SQLServer database;
class Kafka,SchemaRegistry kafka;
class ConsumerMinio,ConsumerSpark,ConsumerDelta consumers;
class Datahub datahub;
Kafka Strimzi, Debezium CDC AVRO, Confluent Schema Registry, Postgres/SQL Server
Postgres
- YAML - Notebook
![drawing](https://github.com/rogeriomm/labtools-k8s/raw/master/docs/kafka/KafkaUiTopicsCdc.png)
![drawing](https://github.com/rogeriomm/labtools-k8s/raw/master/docs/kafka/KafkaUiSchemaCdc.png)
Microsoft SQL Server CDC
- YAML Notebook Notebook CDC
Zeppelin/Jupyter
- YAML
![drawing](https://github.com/rogeriomm/labtools-k8s/raw/master/docs/zeppelin/ZeppelinWeb.png)
- YAML
![drawing](https://github.com/rogeriomm/labtools-k8s/raw/master/docs/jupyter/JupyterWeb.png)
Spark
![drawing](https://github.com/rogeriomm/labtools-k8s/raw/master/docs/spark/SparkWebUi.png)
Metabase
![drawing](https://github.com/rogeriomm/labtools-k8s/raw/master/docs/metabase/MetabaseWeb.png)
Datahub
![drawing](https://github.com/rogeriomm/labtools-k8s/raw/master/docs/datahub/DatahubWeb.png)
OpenMetadata
![drawing](https://github.com/rogeriomm/labtools-k8s/raw/master/docs/openmetadata/OpenmetadataWeb.png)
Airflow
![drawing](https://github.com/rogeriomm/labtools-k8s/raw/master/docs/airflow/AirflowWeb.png)
Minio
![drawing](https://github.com/rogeriomm/labtools-k8s/raw/master/docs/minio/MinioOperatorWeb.png)
![drawing](https://github.com/rogeriomm/labtools-k8s/raw/master/docs/minio/MinioWeb.png)
Argo CD
![drawing](https://github.com/rogeriomm/labtools-k8s/raw/master/docs/argocd/ArgoCdWeb.png)
Kubernetes
![drawing](https://github.com/rogeriomm/labtools-k8s/raw/master/docs/k8s/K8sPodsCluster2.png)
Web local
Local URL | Description | User | Password |
---|---|---|---|
https://dashboard.worldl.xpt/ | K8S dashboard | ||
https://argocd.worldl.xpt | ArgoCD | admin | Notebook |
https://zeppelin.worldl.xpt | Zeppelin | ||
https://jupyter.worldl.xpt/jupyter | Jupyter notebook: Python,Scala, RUST | ||
https://jupyter-commander.worldl.xpt/jupyter | Jupyter notebook: Python,Scala, RUST - K8S Admin Service Account | ||
https://minio-console.worldl.xpt | MINIO operator instance minio-tenant-1 | minio | awesomes3 |
https://console.minio-operator.svc.cluster2.xpt:9090 | MINIO operator | ||
https://airflow.worldl.xpt/flower/ | Airflow flower | admin | admin |
https://airflow.worldl.xpt/airflow | Airflow | ||
https://jupyter-glue2.worldl.xpt/ | AWS Glue version 2.0 - Jupyter | ||
https://webui-glue2.worldl.xpt/ | AWS Glue version 2.0 - WebUI | ||
https://history-glue2.worldl.xpt/ | AWS Glue version 2.0 - History | ||
https://jupyter-glue3.worldl.xpt/ | AWS Glue version 3.0 - Jupyter | ||
https://webui-glue3.worldl.xpt/ | AWS Glue version 3.0 - WebUI | ||
https://history-glue3.worldl.xpt/ | AWS Glue version 3.0 - History | ||
https://jupyter-glue4.worldl.xpt/ | AWS Glue version 4.0 - Jupyter | ||
https://webui-glue4.worldl.xpt/ | AWS Glue version 4.0 - WebUI | ||
https://history-glue4.worldl.xpt/ | AWS Glue version 4.0 - History | ||
http://datahub.worldl.xpt/ | Datahub | datahub | manualPassword |
https://openmetadata.worldl.xpt/ | OpenMetadata | admin | admin |
https://kafkaui.worldl.xpt/ | Kafka UI | ||
https://redpanda-console.worldl.xpt/ | Redpanda Console | ||
https://metabase.worldl.xpt/ | Metabase | ||
http://trino.trino.svc:8080 | Trino | ||
https://jfrog.worldl.xpt | Jfrog | admin | password |
https://harbor.worldl.xpt | Harbor | admin | notebook |
https://nexus.worldl.xpt/ | Nexus Free trial | admin | admin123 |
https://nexus.admin.worldl.xpt/ | Nexus Free trial | ||
https://keycloack.worldl.xpt | Keycloak | user | notebook |
Internet Web (Protected by Firewall)
Public URL | Description | |||
---|---|---|---|---|
https://world-zeppelin.duckdns.org | Zeppelin | |||
https://world-jupyter.duckdns.org/jupyter | Jupyter notebook: Python, Scala, RUST |