cloud_ml_platform
cloud_ml_platform copied to clipboard
Cloud ML Platform
Borrowed many codes from this awesome repo.
This repo contains example code for a (very basic) Cloud ML platform.
- The ml_pipeline directory contains an example machine learning project for iris classification.
- The ml_infra directory contains Pulumi code that spins up the shared infrastructure of the ML platform, such as Kubernetes, MLFlow, etc.
Why?
As data science teams become more mature with models reaching actual production, the need for a proper infrastructure becomes crucial. Leading companies in the field with massive engineering teams like Uber, Netflix and Airbnb had created multiple solutions for their infrastructure and named the combination of them as “ML Platform”.
We hope this repo can help you get started with building your own ML platform ❤️
Architecture
Based on the following projects:
- FastAPI - for model serving
- MLFlow - for experiment tracking
- DVC - for data versioning
- Pulumi - Infrastructure as Code
- GitHub Actions - for CI/CD
- Traefik - API gateway
- PDM - Python dependency management
- Hydra - for parameter management
- Prefect - for workflow management
When building your own ML platform, do not take these tools for granted! Check out alternatives and find the best tools that solve each one of your problems.
What's missing from this?
Well... a lot actually. Here's a partial list:
- HTTPS & Authentication
- Data quality and governance
- Large scale data processing framework
- Feature store for offline and online scenarios
- Jupyter notebook support
- Advance workflow orchestration and scheduling
- Distributed model training support
- Model diagnostic and error analysis
- Model performance monitoring
- and probably much more!
We would love your help!
Getting Started
The following steps are based on Windows development environment.
-
Before You Begin:
Install
pulumi,nodeand aws tools:choco install pulumi choco install nodejs msiexec.exe /i https://awscli.amazonaws.com/AWSCLIV2.msi choco install -y aws-iam-authenticator choco install -y kubernetes-helmInstall
pdm:(Invoke-WebRequest -Uri https://raw.githubusercontent.com/pdm-project/pdm/main/install-pdm.py -UseBasicParsing).Content | python - -
Bring Up Infra:
We use
pulumito setup the infrastructure.$env:AWS_ACCESS_KEY_ID = "<YOUR_ACCESS_KEY_ID>"; $env:AWS_SECRET_ACCESS_KEY = "<YOUR_SECRET_ACCESS_KEY>" cd ml_infra npm install pulumi upIf you encounter any errors, just fix the code and run
pulumi upagain.When it succeeded, export the kubeconfig:
pulumi stack output kubeconfig > ~\.kube\configNow we can check the k8s cluster and pods status by running command:
kubectl get no kubectl get po -
Prepare Python Environment:
We use
pdmto manage the Python environment:cd ml_pipeline pdm install -
Prepare Raw Data:
Put your raw data into the
data/rawfolder, and usedvcto manage them.pdm run dvc --cd data add raw/iris.parquet pdm run dvc --cd data push # if you are in a git repo git add raw/iris.parquet.dvc -
Run Pipeline:
Modify the data processing, feature engineering, modeling and orchestration code. Once you are done, run the whole pipeline.
pdm run python pipeline.py -
View Experiments:
We can check the model training results and artifacts through MLFlow. Visit http://your_domain_name/mlflow .
-
Use Prefect:
TODO
-
Deploy Model:
We use
pulumito build image, push to remote repository and configure the serving service.cd ml_pipeline/infra npm install pulumi upNow you can test your model through public API: http://your_domain_name/models/<model_name>/predict .
-
CI/CD:
We provided Github actions config file in
ml_pipeline/.githubfolder. Modify the workflow and push theml_pipelineproject to Github. The workflow will be automatically triggered.
You can also visit this blog post to read more details about this project.