Anaconda with TensorFlow-GPU and NVIDIA CUDA X

Containerized, reproducible, development environment with Anaconda, NVIDIA CUDA 10.1, TensorFlow-GPU, Keras-GPU, Dask, CuPy (GPU Accelerated drop in Numpy replacement), and PyCUDA.

Reproducible ML

It is up to you, the developer, to version lock the container and ensure same GPU architecture is used, you might want to look into TFX and TF Serving to do so.

Anaconda + Tensorflow: CUDA enabled GPU Machine Learning Development Environment

Features

Anaconda: Anaconda is a distribution of Python for scientific computing
TensorFlow for GPU v1.14: GPU enabled Machine Learning framework
TensorBoard: Understand, debug, and optimize, located on localhost:6006 , Official Docs
Keras-GPU: Keras: The Python Deep Learning library for GPUs
CuPy:latest: GPU accelerated drop in replacement for numpy
Numba: Numba also works great with Jupyter notebooks for interactive computing, and with distributed execution frameworks, like Dask and Spark, allows you pipe functions to be executed on GPU, etc

Distributed Feature Engineering

Dask Distributed: Distributed ingestion of data/See Scaling Python with Dask
Feature Tools: Automated feature engineering

CUDA for GPU/TPU Enablement

NVIDIA TensorRT inference accelerator and CUDA 10: CUDA + TPUs makes you awesome
PyCUDA 2019: Access NVIDIA GPUs and TPUs (Examples here); Python interface for direct access to GPU or TPU
cuDNN7.4.1.5 for deeep learning in CNN's: GPU-accelerated library of primitives for deep neural networks

Good to know

Hot Reloading of Docker Container: code updates will automatically update in container from /apps folder.
TensorBoard is on localhost:6006 and GPU enabled Jupyter is on localhost:8888.
Python 3.7
Only Tesla Pascal and Turing GPU Architecture are supported
Test with synthetic data that compares GPU to CPU benchmark, and Tensorboard example:
1. CPU/GPU Benchmark
2. Tensorboard to understand & debug neural networks

Before you begin (This might be optional)

Link to nvidia-docker2 install: Tutorial

You must install nvidia-docker2 and all it's deps first, assuming that is done, run:

sudo apt-get install nvidia-docker2

sudo pkill -SIGHUP dockerd

sudo systemctl daemon-reload

sudo systemctl restart docker

How to run this container:

Step 1

docker build -t <container name> . < note the . after

Step 2

Run the image, mount the volumes for Jupyter and app folder for your fav IDE, and finally the expose ports 8888 for TF1, and 6006 for TensorBoard.

docker run --rm -it --runtime=nvidia --user $(id -u):$(id -g) --group-add container_user --group-add sudo -v "${PWD}:/apps" -v $(pwd):/tf/notebooks -p 8888:8888 -p 0.0.0.0:6006:6006 <container name>

Step 3: Check to make sure GPU drivers and CUDA is running

Exec into the container and check if your GPU is registering in the container and CUDA is working:
Get the container id:

docker ps

Exec into container:

docker exec -u root -t -i <container id> /bin/bash

Check if NVIDIA GPU DRIVERS have container access:

nvidia-smi

Check if CUDA is working:

nvcc -V

Step 4: How to launch TensorBoard

(It helps to use multiple tabs in cmd line, as you have to leave at least 1 tab open for TensorBoard@:6006)

Demonstrates the functionality of TensorBoard dashboard
Exec into container if you haven't, as shown above:
Get the <container id>:

docker ps

docker exec -u root -t -i <container id> /bin/bash

Then run in cmd line:

tensorboard --logdir=//tmp/tensorflow/mnist/logs

Type in: cd / to get root.

Then cd into the folder that hot reloads code from your local folder/fav IDE at: /apps/apps/gpu_benchmarks and run:

python tensorboard.py

Go to the browser and navigate to: localhost:6006
You should see the following automatically populate in localhost:6006:

Step 5: Run tests to prove container based GPU perf

Demonstrate GPU vs CPU performance:
Exec into the container if you haven't, and cd over to /tf/notebooks/apps/gpu_benchmarks and run:
CPU Perf:

python benchmark.py cpu 10000

CPU perf should return something like this:

Shape: (10000, 10000) Device: /cpu:0 Time taken: 0:00:03.934996

GPU perf:

python benchmark.py gpu 10000

GPU perf should return something like this:

Shape: (10000, 10000) Device: /gpu:0 Time taken: 0:00:01.032577

Misc: Troubleshooting Docker conflicts, container errors, volume mappings, etc...

Exec into a container: Get container name or ID: docker ps Exec into container: docker exec -u root -t -i <container name or id> /bin/bash

Remove all containers:

docker rm $(docker ps -a -q)

Remove all images:

docker rmi $(docker images -a -q)

Remove a volume (necessary when re-alloc'ing new file paths for mounted volumes):

docker volume ls

docker volume rm volume_name volume_name

Known conflicts with nvidia-docker and Ubuntu

AppArmor on Ubuntu has sec issues, so remove docker from it on your local box, (it does not hurt security on your computer):

sudo aa-remove-unknown

Anaconda-CUDA-Accelerated-TensorFlowGPU-Development-Environment
Anaconda-CUDA-Accelerated-TensorFlowGPU-Development-Environment copied to clipboard

Metadata

Anaconda with TensorFlow-GPU and NVIDIA CUDA X

Reproducible ML

Anaconda + Tensorflow: CUDA enabled GPU Machine Learning Development Environment

Features

Distributed Feature Engineering

CUDA for GPU/TPU Enablement

Good to know

Before you begin (This might be optional)

Step 1

Step 2

Step 3: Check to make sure GPU drivers and CUDA is running

Step 4: How to launch TensorBoard

Step 5: Run tests to prove container based GPU perf

Misc: Troubleshooting Docker conflicts, container errors, volume mappings, etc...

Known conflicts with nvidia-docker and Ubuntu

← Metadata

Owner

Metadata

Anaconda-CUDA-Accelerated-TensorFlowGPU-Development-Environment Anaconda-CUDA-Accelerated-TensorFlowGPU-Development-Environment copied to clipboard

Metadata

Anaconda with TensorFlow-GPU and NVIDIA CUDA X

Reproducible ML

Anaconda + Tensorflow: CUDA enabled GPU Machine Learning Development Environment

Features

Distributed Feature Engineering

CUDA for GPU/TPU Enablement

Good to know

Before you begin (This might be optional)

Step 1

Step 2

Step 3: Check to make sure GPU drivers and CUDA is running

Step 4: How to launch TensorBoard

Step 5: Run tests to prove container based GPU perf

Misc: Troubleshooting Docker conflicts, container errors, volume mappings, etc...

Known conflicts with nvidia-docker and Ubuntu

← Metadata

Owner

Metadata

Anaconda-CUDA-Accelerated-TensorFlowGPU-Development-Environment
Anaconda-CUDA-Accelerated-TensorFlowGPU-Development-Environment copied to clipboard