tensorflow-k8s-azure icon indicating copy to clipboard operation
tensorflow-k8s-azure copied to clipboard

Train TensorFlow Models at Scale with Kubernetes and Kubeflow on Azure

:warning: This repository is deprecated! Go to Azure/kubeflow-labs instead :warning:

Train TensorFlow Models at Scale with Kubernetes on Azure

Prerequisites

  1. Have a valid Microsoft Azure subscription allowing the creation of an ACS cluster
  2. Docker client installed: Installing Docker
  3. Azure-cli (2.0) installed: Installing the Azure CLI 2.0 | Microsoft Docs
  4. Git cli installed: Installing Git CLI
  5. Kubectl installed: Installing Kubectl
  6. Helm installed: Installing Helm CLI (Note: On Windows you can extract the tar file using a tool like 7Zip.

Clone this repository somewhere so you can easily access the different source files:

git clone https://github.com/wbuchwalter/tensorflow-k8s-azure

Content Summary

Module Description
0 Introduction Introduction to this workshop. Motivations and goals.
1 Docker Docker and containers 101.
2 Kubernetes Kubernetes important concepts overview.
3 Helm Introduction to Helm
4 GPUs How to use GPUs with Kubernetes.
5 TFJob How to use tensorflow/k8s and TFJob to deploy a simple TensorFlow training.
6 Distributed Tensorflow Going distributed with TFJob
7 Hyperparameters Sweep with Helm Using Helm to deploy a large number of training testing different hypothesis, monitoring and comparing them.
8 Going Further Links and resources to go further: Autoscaling, Distributed Storage.
9 Jupyter Notebooks Easily deploy a Jupyter Notebook instance on Kubernetes.