Anaconda-CUDA-Accelerated-TensorFlowGPU-Development-Environment
Anaconda-CUDA-Accelerated-TensorFlowGPU-Development-Environment copied to clipboard
A reproducible containerized environment with CUDA X, Anaconda, TensorFlow-GPU, Keras-GPU, Dask, and PyCUDA.
Anaconda with TensorFlow-GPU and NVIDIA CUDA X
Containerized, reproducible, development environment with Anaconda, NVIDIA CUDA 10.1, TensorFlow-GPU, Keras-GPU, Dask, CuPy (GPU Accelerated drop in Numpy replacement), and PyCUDA.
Reproducible ML
It is up to you, the developer, to version lock the container and ensure same GPU architecture is used, you might want to look into TFX and TF Serving to do so.
Anaconda + Tensorflow: CUDA enabled GPU Machine Learning Development Environment
Features
-
Anaconda: Anaconda is a distribution of Python for scientific computing
-
TensorFlow for GPU v1.14: GPU enabled Machine Learning framework
-
TensorBoard: Understand, debug, and optimize, located on
localhost:6006
, Official Docs -
Keras-GPU: Keras: The Python Deep Learning library for GPUs
-
CuPy:latest: GPU accelerated drop in replacement for numpy
-
Numba: Numba also works great with Jupyter notebooks for interactive computing, and with distributed execution frameworks, like Dask and Spark, allows you pipe functions to be executed on GPU, etc
Distributed Feature Engineering
-
Dask Distributed: Distributed ingestion of data/See Scaling Python with Dask
-
Feature Tools: Automated feature engineering
CUDA for GPU/TPU Enablement
-
NVIDIA TensorRT inference accelerator and CUDA 10: CUDA + TPUs makes you awesome
-
PyCUDA 2019: Access NVIDIA GPUs and TPUs (Examples here); Python interface for direct access to GPU or TPU
-
cuDNN7.4.1.5 for deeep learning in CNN's: GPU-accelerated library of primitives for deep neural networks
Good to know
-
Hot Reloading of Docker Container: code updates will automatically update in container from /apps folder.
-
TensorBoard is on localhost:6006 and GPU enabled Jupyter is on localhost:8888.
-
Python 3.7
-
Only Tesla Pascal and Turing GPU Architecture are supported
-
Test with synthetic data that compares GPU to CPU benchmark, and Tensorboard example:
Before you begin (This might be optional)
Link to nvidia-docker2 install: Tutorial
You must install nvidia-docker2 and all it's deps first, assuming that is done, run:
sudo apt-get install nvidia-docker2
sudo pkill -SIGHUP dockerd
sudo systemctl daemon-reload
sudo systemctl restart docker
How to run this container:
Step 1
docker build -t <container name> .
< note the . after
Step 2
Run the image, mount the volumes for Jupyter and app folder for your fav IDE, and finally the expose ports 8888
for TF1, and 6006
for TensorBoard.
docker run --rm -it --runtime=nvidia --user $(id -u):$(id -g) --group-add container_user --group-add sudo -v "${PWD}:/apps" -v $(pwd):/tf/notebooks -p 8888:8888 -p 0.0.0.0:6006:6006 <container name>
Step 3: Check to make sure GPU drivers and CUDA is running
-
Exec into the container and check if your GPU is registering in the container and CUDA is working:
-
Get the container id:
docker ps
- Exec into container:
docker exec -u root -t -i <container id> /bin/bash
- Check if NVIDIA GPU DRIVERS have container access:
nvidia-smi
- Check if CUDA is working:
nvcc -V
Step 4: How to launch TensorBoard
(It helps to use multiple tabs in cmd line, as you have to leave at least 1 tab open for TensorBoard@:6006)
-
Demonstrates the functionality of TensorBoard dashboard
-
Exec into container if you haven't, as shown above:
-
Get the
<container id>
:
docker ps
docker exec -u root -t -i <container id> /bin/bash
- Then run in cmd line:
tensorboard --logdir=//tmp/tensorflow/mnist/logs
- Type in:
cd /
to get root.
Then cd into the folder that hot reloads code from your local folder/fav IDE at: /apps/apps/gpu_benchmarks
and run:
python tensorboard.py
-
Go to the browser and navigate to:
localhost:6006
-
You should see the following automatically populate in localhost:6006:
Step 5: Run tests to prove container based GPU perf
-
Demonstrate GPU vs CPU performance:
-
Exec into the container if you haven't, and cd over to /tf/notebooks/apps/gpu_benchmarks and run:
-
CPU Perf:
python benchmark.py cpu 10000
- CPU perf should return something like this:
Shape: (10000, 10000) Device: /cpu:0 Time taken: 0:00:03.934996
- GPU perf:
python benchmark.py gpu 10000
- GPU perf should return something like this:
Shape: (10000, 10000) Device: /gpu:0 Time taken: 0:00:01.032577
Misc: Troubleshooting Docker conflicts, container errors, volume mappings, etc...
- Exec into a container:
Get container name or ID:
docker ps
Exec into container:docker exec -u root -t -i <container name or id> /bin/bash
-
Remove all containers:
docker rm $(docker ps -a -q)
-
Remove all images:
docker rmi $(docker images -a -q)
-
Remove a volume (necessary when re-alloc'ing new file paths for mounted volumes):
docker volume ls
docker volume rm volume_name volume_name
Known conflicts with nvidia-docker and Ubuntu
AppArmor on Ubuntu has sec issues, so remove docker from it on your local box, (it does not hurt security on your computer):
sudo aa-remove-unknown