ck-tensorflow
ck-tensorflow copied to clipboard
Create Docker image with stable CK for CK+TF+MLPerf
Let's create a Docker image with stable CK repositories for TensorFlow and our reference MLPerf workflows. It shouldn't be very difficult and allow the community to use: a) latest CK workflows for reference MLPerf implementation (may sometimes fail in latest environment) b) stable implementation which should always work but may not use latest frameworks and environments.
We now have quite a few Docker images for TFLite, ArmNN-TFLite and TF-C++:
anton@velociti$ docker image ls
REPOSITORY TAG IMAGE ID CREATED SIZE
ctuning/object-detection-armnn-tflite.debian-9 latest 8d8e41f08acc 6 minutes ago 4.55GB
ctuning/object-detection-tflite.debian-9 latest 57045c893cd0 30 minutes ago 4.38GB
ctuning/image-classification-tflite.centos-7.stable latest 1f0a0b5ebeba 34 hours ago 2.11GB
ctuning/image-classification-armnn-tflite.debian-9 latest 95eee6c73439 34 hours ago 3.71GB
ctuning/image-classification-tflite.ubuntu-16.04 latest db4169ca1064 39 hours ago 2.05GB
ctuning/image-classification-tflite.ubuntu-18.04.dashboard latest 34b0031dceac 42 hours ago 2.44GB
ctuning/image-classification-tflite.ubuntu-18.04 latest 15b6ee3982c1 42 hours ago 2.18GB
ctuning/image-classification-tflite.centos-7 latest b45658e31334 2 days ago 2.24GB
ctuning/image-classification-tflite.debian-9 latest 7e54a86bb258 2 days ago 2.18GB
ctuning/image-classification-tf-cpp.debian-9 latest edd6913a6549 2 days ago 5.34GB
ubuntu 16.04 2a697363a870 2 weeks ago 119MB
debian 9 8d31923452f8 3 weeks ago 101MB
ubuntu 18.04 d131e0fa2585 5 weeks ago 102MB
centos 7.6.1810 f1cb7c7d58b7 2 months ago 202MB
centos 7 9f38484d220f 2 months ago 202MB
centos centos7 9f38484d220f 2 months ago 202MB
One of the issues with Docker images (however incredibly helpful they are in expanding our testing capability) is the sheer size of them, especially when there contents is not reused. Of course, on the same machine layers are likely to be reused, so that:
$ docker image ls "ctuning/object-detection*"
REPOSITORY TAG IMAGE ID CREATED SIZE
ctuning/object-detection-armnn-tflite.debian-9 latest 8d8e41f08acc 40 minutes ago 4.55GB
ctuning/object-detection-tflite.debian-9 latest 57045c893cd0 About an hour ago 4.38GB
doesn't mean 9 GB of space, as:
$ docker system df -v
docker system df -v | grep object-detection
ctuning/object-detection-armnn-tflite.debian-9 latest 8d8e41f08acc 43 minutes ago 4.553GB 2.392GB 2.161GB 0
ctuning/object-detection-tflite.debian-9 latest 57045c893cd0 About an hour ago 4.378GB 2.392GB 1.986GB 0
Still, it would nice to consider creating "ensembles" of CK Docker images. Perhaps we could place CK components (libraries, datasets) on virtual drives, and seamlessly map them from inside Docker images?
Yes. That's the problem with Docker. Should we upload all of them to the Docker Hub?
For example, for ctuning/object-detection-armnn-tflite.debian-9
and ctuning/object-detection-tflite.debian-9
this "Venn Diagram intersection" would contain the TFLite library and the non-quantized MobileNet model.
On the other hand, armnn-tflite
uses only the first 50 preprocessed images, while tflite
all the 5000 preprocessed images. Still, we could share the union, and let armnn-tflite
only tap into the first preprocessed images.
By rearranging the order of statements in the armnn-tflite
image, I was able to increase the shared size between images slightly (~50 MB):
- before:
anton@velociti:~$ docker system df -v | grep object-detection
ctuning/object-detection-armnn-tflite.debian-9 latest e2b60e919eea 3 minutes ago 4.553GB 2.392GB 2.161GB 0
ctuning/object-detection-tflite.debian-9 latest a05c17c8eaa9 18 minutes ago 4.378GB 2.392GB 1.986GB 0
- after:
anton@velociti:~$ docker system df -v | grep object-detection
ctuning/object-detection-armnn-tflite.debian-9 latest ffcc3fcf1615 About an hour ago 4.553GB 2.441GB 2.111GB 0
ctuning/object-detection-tflite.debian-9 latest a05c17c8eaa9 2 hours ago 4.378GB 2.392GB 1.986GB 0
Here's the size of all things installed under ${CK_TOOLS}
:
dvdt@3e7902a9a7b5:~/CK_TOOLS$ du -hs *
52M dataset-coco-2017-val
13M dataset-object-detection-preprocessed-coco.2017-first.50
168M lib-armnn-19.05-gcc-6.3.0-rel.19.05-tflite-linux-64
820M lib-boost-1.64.0-gcc-6.3.0-for-armnn-static-linux-64
35M lib-flatbuffers-master-gcc-6.3.0-linux-64
75M lib-protobuf-host-3.0.0-compiler.gcc-6.3.0-linux-64
87M lib-protobuf-host-3.5.1-compiler.gcc-6.3.0-linux-64
11M lib-python-cython-compiler.python-3.5.3-linux-64
108M lib-python-matplotlib-compiler.python-3.5.3-linux-64
70M lib-python-numpy-compiler.python-3.5.3-linux-64
7.0M lib-python-pillow-compiler.python-3.5.3-linux-64
157M lib-python-scipy-1.2.1-compiler.python-3.5.3-linux-64
120K lib-rtl-xopenme-0.3-gcc-6.3.0-linux-64
593M lib-tflite-src-static-1.13.1-gcc-6.3.0-linux-64
79M model-tflite-mlperf-ssd-mobilenet-downloaded
569M tensorflow-source-linux-64
1010M tensorflowmodel-api-master
18M tool-coco-master-gcc-6.3.0-compiler.python-3.5.3-linux-64
After another rearrangement (by essentially sharing lib-tflite-src-static-1.13.1-gcc-6.3.0-linux-64
):
docker system df -v | grep object-detection
ctuning/object-detection-tflite.debian-9 latest c2f19b032e87 2 minutes ago 4.378GB 2.88GB 1.498GB 0
ctuning/object-detection-armnn-tflite.debian-9 latest b6ff58b85ed4 10 minutes ago 4.553GB 2.88GB 1.673GB 0
I was able to cut the source of Boost as follows:
RUN ck install package --tags=lib,boost,for-armnn-static,v1.64\
&& ck virtual env --tags=lib,boost,for-armnn-static,v1.64 --shell_cmd='rm -rf $CK_ENV_LIB_BOOST/../boost_1_64_0'
The only niggle is an ioctl
message in red which seems to happen just when the virtual shell invokes the shell command:
Recording CK configuration to /home/dvdt/CK_TOOLS/lib-boost-1.64.0-gcc-6.3.0-for-armnn-static-linux-64/ck-install.json ...
Installation path: /home/dvdt/CK_TOOLS/lib-boost-1.64.0-gcc-6.3.0-for-armnn-static-linux-64
Installation time: 61.86750650405884 sec.
bash: cannot set terminal process group (1): Inappropriate ioctl for device
bash: no job control in this shell
Removing intermediate container 1a9ea20f367d
(Similarly to the COCO originals and training annotations.)
docker run -it --rm ctuning/object-detection-armnn-tflite.debian-9 "du -hs /home/dvdt/CK_TOOLS/*"
52M /home/dvdt/CK_TOOLS/dataset-coco-2017-val
13M /home/dvdt/CK_TOOLS/dataset-object-detection-preprocessed-coco.2017-first.50
168M /home/dvdt/CK_TOOLS/lib-armnn-19.05-gcc-6.3.0-rel.19.05-tflite-linux-64
154M /home/dvdt/CK_TOOLS/lib-boost-1.64.0-gcc-6.3.0-for-armnn-static-linux-64
35M /home/dvdt/CK_TOOLS/lib-flatbuffers-master-gcc-6.3.0-linux-64
75M /home/dvdt/CK_TOOLS/lib-protobuf-host-3.0.0-compiler.gcc-6.3.0-linux-64
87M /home/dvdt/CK_TOOLS/lib-protobuf-host-3.5.1-compiler.gcc-6.3.0-linux-64
11M /home/dvdt/CK_TOOLS/lib-python-cython-compiler.python-3.5.3-linux-64
108M /home/dvdt/CK_TOOLS/lib-python-matplotlib-compiler.python-3.5.3-linux-64
70M /home/dvdt/CK_TOOLS/lib-python-numpy-compiler.python-3.5.3-linux-64
7.0M /home/dvdt/CK_TOOLS/lib-python-pillow-compiler.python-3.5.3-linux-64
157M /home/dvdt/CK_TOOLS/lib-python-scipy-1.2.1-compiler.python-3.5.3-linux-64
120K /home/dvdt/CK_TOOLS/lib-rtl-xopenme-0.3-gcc-6.3.0-linux-64
593M /home/dvdt/CK_TOOLS/lib-tflite-src-static-1.13.1-gcc-6.3.0-linux-64
79M /home/dvdt/CK_TOOLS/model-tflite-mlperf-ssd-mobilenet-downloaded
569M /home/dvdt/CK_TOOLS/tensorflow-source-linux-64
1010M /home/dvdt/CK_TOOLS/tensorflowmodel-api-master
18M /home/dvdt/CK_TOOLS/tool-coco-master-gcc-6.3.0-compiler.python-3.5.3-linux-64
As @bellycat77 has suggested a couple of times, we could cut some fat from tensorflowmodel-api-master
(it' almost 1 GB!).
I'll also see if we can use the installed TFLite source as the TF source required by ArmNN.
One could try cunningly do:
RUN echo "1.13.1" | ck detect soft:lib.tensorflow.source
instead of:
RUN ck install package --tags=tensorflow,source
Unfortunately, a detector bug produces duplicate entries for TF:
---> Running in ca9fc22a855a
Searching for TensorFlow library source (tensorflow/tensorflow.bzl) to automatically register in the CK - it may take some time, please wait ...
* Searching in /usr ...
* Searching in /opt ...
* Searching in /home/dvdt/CK_TOOLS ...
* Searching in /home/dvdt ...
Search completed in 3.9 secs. Found 2 target files (may be pruned) ...
Detecting and sorting versions (ignore some work output) ...
* /home/dvdt/CK_TOOLS/lib-tflite-src-static-1.13.1-gcc-6.3.0-linux-64/src/tensorflow/tensorflow.bzl
WARNING: do not know how to detect version of a given software
* /home/dvdt/CK_TOOLS/lib-tflite-src-static-1.13.1-gcc-6.3.0-linux-64/src/tensorflow/tensorflow.bzl
WARNING: do not know how to detect version of a given software
Registering software installations found on your machine in the CK:
(HINT: enter -1 to force CK package installation)
0) /home/dvdt/CK_TOOLS/lib-tflite-src-static-1.13.1-gcc-6.3.0-linux-64/src/tensorflow/tensorflow.bzl
1) /home/dvdt/CK_TOOLS/lib-tflite-src-static-1.13.1-gcc-6.3.0-linux-64/src/tensorflow/tensorflow.bzl
But this works:
RUN echo "1.13.1" | ck detect soft:lib.tensorflow.source --full_path=/home/dvdt/CK_TOOLS/lib-tflite-src-static-1.13.1-gcc-6.3.0-linux-64/src/tensorflow/tensorflow.bzl
$ docker system df -v | grep object-detection
ctuning/object-detection-armnn-tflite.debian-9 latest 1fb3d18515ec 17 minutes ago 3.454GB 2.962GB 492.5MB 1
ctuning/object-detection-tflite.debian-9 latest c2f19b032e87 About an hour ago 4.378GB 2.962GB 1.416GB 0
I guess armnn-tflite
has 0.5 GB of C/C++ dependencies (ArmNN, Boost, FlatBuffers, etc.), while tflite
contains the full preprocessed COCO validation dataset.
@gfursin Yes, I guess you can upload them one by one! Ping me when you are done, so I can update the corresponding webpages.
Ok. I downloaded them all (I believe): https://hub.docker.com/u/ctuning !
You mean, uploaded? :)
TF-C++ is huge:
$ docker run -it --rm ctuning/image-classification-tf-cpp.debian-9 "du -hs /home/dvdt/CK_TOOLS/*"
62M /home/dvdt/CK_TOOLS/dataset-imagenet-ilsvrc2012-aux
64M /home/dvdt/CK_TOOLS/dataset-imagenet-ilsvrc2012-val-min
73M /home/dvdt/CK_TOOLS/dataset-imagenet-preprocessed
70M /home/dvdt/CK_TOOLS/lib-python-numpy-compiler.python-3.5.3-linux-64
7.0M /home/dvdt/CK_TOOLS/lib-python-pillow-compiler.python-3.5.3-linux-64
157M /home/dvdt/CK_TOOLS/lib-python-scipy-1.2.1-compiler.python-3.5.3-linux-64
120K /home/dvdt/CK_TOOLS/lib-rtl-xopenme-0.3-gcc-6.3.0-linux-64
2.5G /home/dvdt/CK_TOOLS/lib-tensorflow-src-static-1.13.1-gcc-6.3.0-linux-64
101M /home/dvdt/CK_TOOLS/model-tf-mlperf-mobilenet-downloaded-from-zenodo
43M /home/dvdt/CK_TOOLS/model-tf-mlperf-mobilenet-quantized-downloaded-from-google
98M /home/dvdt/CK_TOOLS/model-tf-mlperf-resnet-downloaded-from-zenodo
Ups sorry. Yes, I pushed all of these images to docker/ctuning ...
I've provided descriptions for the following image classification images:
The following object-detection images need to be pushed:
$ docker system df -v | grep object-detection
ctuning/object-detection-armnn-tflite.debian-9 latest 8587785e3298 8 minutes ago 2.331GB 1.839GB 492.6MB 0
ctuning/object-detection-tflite.debian-9 latest bdf725180c00 15 minutes ago 3.255GB 1.839GB 1.416GB 0
under, respectively:
The massive reduction in the shared image size (2.962GB to 1.839GB) is thanks to noticing that the dependency on the TensorFlow Object Detection API was not in fact needed for these C++-based variants, and removing it from the SSD-MobileNet model and programs.
I've also created a cool "dashboard" Docker image to benchmark the host machine on docker build
and then display the results interactively similar to http://cknowledge.org/dashboard/mlperf.mobilenets
On the screenshot above, you can see four points corresponding to benchmarking 4 models on a Xeon laptop:
- MobileNet non-quantized
- MobileNet quantized
- ResNet with ArgMax
- ResNet without ArgMax
As expected, quantized MobileNet is slightly less accurate that non-quantized MobileNet, and ResNet with and without ArgMax are nearly identical (within experimental error margins). Unexpectedly, quantized MobileNet is 3x slower than non-quantized MobileNet. (But we know that TFLite is not optimized for x86.) Incidentally, ResNet is also 3x times slower than non-quantized MobileNet, while being only 1% more accurate (on the first 500 images of ImageNet 2012).