Description

stargz/nydus can accelerate the image load process on Kubernetes. Let's investigate how to integrate and the benefits to AI/ML use case.

Message from the maintainers:

Love this enhancement proposal? Give it a 👍. We prioritise the proposals with the most 👍.

Nov 10 '22 08:11 gaocegege

In production enviroment, i rarely see images below 20G, some user even put the data in image. I think image accelerate is highlight and practical solutions to user problems.

Nov 14 '22 14:11 xieydd

Yep, I think so.

Nov 15 '22 01:11 gaocegege

filename: usr/local/lib/python3.8/dist-packages/wrapt-1.14.1.dist-info/top_level.txt, offset: 1046700032, size: 6
filename: usr/local/lib/python3.8/dist-packages/zipp/, offset: 1046701056, size: 0
filename: usr/local/lib/python3.8/dist-packages/zipp/__init__.py, offset: 1046701568, size: 8659
filename: usr/local/lib/python3.8/dist-packages/zipp/__pycache__/, offset: 1046710784, size: 0
filename: usr/local/lib/python3.8/dist-packages/zipp/__pycache__/__init__.cpython-38.pyc, offset: 1046711296, size: 10762
filename: usr/local/lib/python3.8/dist-packages/zipp/__pycache__/py310compat.cpython-38.pyc, offset: 1046723072, size: 406
filename: usr/local/lib/python3.8/dist-packages/zipp/py310compat.py, offset: 1046724096, size: 309
filename: usr/local/lib/python3.8/dist-packages/zipp-3.10.0.dist-info/, offset: 1046725120, size: 0
filename: usr/local/lib/python3.8/dist-packages/zipp-3.10.0.dist-info/INSTALLER, offset: 1046725632, size: 4
filename: usr/local/lib/python3.8/dist-packages/zipp-3.10.0.dist-info/LICENSE, offset: 1046726656, size: 1050
filename: usr/local/lib/python3.8/dist-packages/zipp-3.10.0.dist-info/METADATA, offset: 1046728704, size: 3672
filename: usr/local/lib/python3.8/dist-packages/zipp-3.10.0.dist-info/RECORD, offset: 1046733312, size: 707
filename: usr/local/lib/python3.8/dist-packages/zipp-3.10.0.dist-info/WHEEL, offset: 1046734848, size: 92
filename: usr/local/lib/python3.8/dist-packages/zipp-3.10.0.dist-info/top_level.txt, offset: 1046735872, size: 5

The index generated by https://github.com/awslabs/soci-snapshotter

Nov 24 '22 08:11 gaocegege

SOCI addresses these issues by loading from the original, unmodified OCI image. Instead of converting the image, it builds a separate index artifact (the "SOCI index"), which lives in the remote registry, right next to the image itself. At container launch time, SOCI Snapshotter queries the registry for the presence of the SOCI index using the mechanism developed by the OCI Reference Types working group.

Nov 24 '22 09:11 gaocegege

About what the snapshot is:

https://github.com/containerd/containerd/blob/main/docs/content-flow.md

Nov 24 '22 09:11 gaocegege

https://github.com/moby/buildkit/blob/master/docs/stargz-estargz.md
https://github.com/moby/buildkit/blob/master/docs/nydus.md

buildkit can build images with nydus/estargz formats.

Nov 24 '22 09:11 gaocegege

Difference of stargz/nydus:

https://github.com/dragonflyoss/image-service/issues/50

Design report:

estargz: https://github.com/containerd/stargz-snapshotter/blob/main/docs/estargz.md
nydus: https://github.com/dragonflyoss/image-service/blob/master/docs/nydus-design.md

Pros & Cons: From my perspective, nydus might be faster, with lower CPU load, but need to introduce an standalone executable file. While estargz is more compatible with buildkit.

There seems some difference in image format of them either, need to do more research.

Nov 24 '22 11:11 cutecutecat

@cutecutecat Are you interested in this? You can pick it up. And I'd appreciate it.

Nov 24 '22 11:11 gaocegege

@cutecutecat Are you interested in this? You can pick it up. And I'd appreciate it.

@gaocegege Yes, I would like to pick it. Are there anything else that need to investigate?

As we have known buildkit can built both of them, I think I could build a large image by buildctl and test the time cost and image size?

#1086
#51

Nov 24 '22 12:11 cutecutecat

Restriction

Nydus

Nydus is conflict with --export-cache and --import-cache, is this acceptable in envd? @gaocegege I think it might be not.

Since exported Nydus image will always have one more metadata layer than images in other compression types, Nydus image cannot be exported/imported as cache.

ref: https://github.com/moby/buildkit/blob/master/docs/nydus.md and https://github.com/moby/buildkit/pull/2581

Estargz

Rootless execution is currently unsupported.

It seems more acceptable. The cache python, R and julia don't work with rootless now, but we should be careful if we would support rootless of cache in the future. If we pick Estargz, we should keep a configurable item for image format, instead of substitude origin image format.

Nov 24 '22 13:11 cutecutecat

Prefetch

Estargz and Nydus support prefetch. This can be used to mitigate runtime performance drawbacks caused by the on-demand fetching of each file.

Maybe we could use https://github.com/docker-slim/docker-slim to do a scan of some typical ML training case, in order to pick which file is hotspot and need to be prefetched.

Nov 24 '22 14:11 cutecutecat

I think we can verify the benifits of theses tools. For example, we can run a shell in tensorflow. And see the startup time.

Nov 25 '22 01:11 gaocegege

golang:1.18-alpine is used to build and run a simple hello.go to test Golang building cost of stargz. mskwyditd/pytorch-cuda-python3.1 is used to run a simple train.go to test Python building cost of stargz. As docker.io is too slow for a 16G image, I deploy an localhost registry by:

# limit registry pull speed to 200mbps

sudo docker run -d \                                                       
    --name docker-tc \
    --network host \
    --cap-add NET_ADMIN \
    --restart always \
    -v /var/run/docker.sock:/var/run/docker.sock \
    -v /var/docker-tc:/var/docker-tc \
    lukaszlach/docker-tc

sudo docker network create test-net

sudo docker run --net test-net --label "com.docker-tc.limit=200mbps"  -d -p 5000:5000 --restart=always --name registry registry:2

The hello.go is simple:

package main

import "fmt"

func main() {
	fmt.Println("Hello, world!")
}

The train.py is use CNN to predict mnist, source from file.

nerdctl is used to pull, convert and build the image.

sudo nerdctl image pull mskwyditd/pytorch-cuda-python3.10:latest

sudo nerdctl image convert --estargz --oci starkind/stargz-examples:pycache  starkind/stargz-examples:pycache-stgz

sudo time -o first.txt buildctl build --frontend dockerfile.v0 \
                 --no-cache \
                 --local context=. \
                 --local dockerfile=. \

Image	size	source	stargz	File	Pull / s	Run first time / s
golang:1.18-alpine	113.35 M	docker.io		hello.go	96.4	1.37
golang:1.18-alpine	117.65 M	docker.io	✅	hello.go	/	37.8
mskwyditd/pytorch-cuda-python3.10	16.2G	localhost		train.py	91.0	406.9
mskwyditd/pytorch-cuda-python3.10	16.3G	localhost	✅	train.py	/	420.1

traditional-pytorch-example

 => [internal] load .dockerignore                                                                             0.0s
 => => transferring context: 2B                                                                               0.0s
 => [internal] load build definition from Dockerfile                                                          0.0s
 => => transferring dockerfile: 376B                                                                          0.0s
 => [internal] load metadata for localhost:5000/pycache:latest                                                0.0s
 => [internal] load build context                                                                             0.0s
 => => transferring context: 5.68kB                                                                           0.0s
 => [1/3] FROM localhost:5000/pycache:latest@sha256:b5d0f6ea5ace68790c08cf17201eaa5998ecf53087b9ac57b392841  91.0s
 => => resolve localhost:5000/pycache:latest@sha256:b5d0f6ea5ace68790c08cf17201eaa5998ecf53087b9ac57b3928415  0.0s
 => => sha256:d031a9181ade169343b9a94cbc6cd4e6647e98f64134d33e94ee8f8f7c85ed5c 86.90kB / 86.90kB              0.0s
 => => sha256:29f6e52f2e6080c637928592798904ecedb31e4079c07748edb7376ebbd2e398 63.10kB / 63.10kB              0.0s
 => => sha256:64129b569154cf5afcea88d65c1657a84d9961b7aaf086bd2fe2f2e3ed2fcad8 6.43kB / 6.43kB                0.0s 
 => => sha256:1362a29ff46515e1f117f2bebd093ce13af97bb1a6f27171abc4990dbee4a435 186B / 186B                    0.0s 
 => => sha256:82bb026e1cd969dcc9dface186bc188a104a5f5a03c6cad8ff422f6f3aa98995 7.26GB / 7.26GB               41.1s 
 => => sha256:813ff0237f8341ab86af37666aa400c9640cb266317881233c7112927b791f8c 1.60GB / 1.60GB               10.3s 
 => => sha256:19e4169ce7d724dbcc1a6f5bf9e5dc21a05a6983173f3522c106bfb4994d07a5 1.18GB / 1.18GB                6.8s 
 => => sha256:ccd8058ddd7517692e482566c35645f0bfdd75354260d9ea207de5c699564bee 56.23MB / 56.23MB              0.3s 
 => => sha256:58710bbb48677cfcf4bed3cdd3cbb56f040f85e1b4fc8df8a2715d7760b45c67 4.60MB / 4.60MB                0.0s 
 => => sha256:cf92e523b49ea3d1fae59f5f082437a5f96c244fda6697995920142ff31d59cf 30.43MB / 30.43MB              0.2s 
 => => extracting sha256:cf92e523b49ea3d1fae59f5f082437a5f96c244fda6697995920142ff31d59cf                     0.6s 
 => => extracting sha256:58710bbb48677cfcf4bed3cdd3cbb56f040f85e1b4fc8df8a2715d7760b45c67                     0.1s 
 => => extracting sha256:ccd8058ddd7517692e482566c35645f0bfdd75354260d9ea207de5c699564bee                     0.8s 
 => => extracting sha256:1362a29ff46515e1f117f2bebd093ce13af97bb1a6f27171abc4990dbee4a435                     0.0s
 => => extracting sha256:64129b569154cf5afcea88d65c1657a84d9961b7aaf086bd2fe2f2e3ed2fcad8                     0.0s
 => => extracting sha256:19e4169ce7d724dbcc1a6f5bf9e5dc21a05a6983173f3522c106bfb4994d07a5                    11.0s
 => => extracting sha256:29f6e52f2e6080c637928592798904ecedb31e4079c07748edb7376ebbd2e398                     0.0s
 => => extracting sha256:813ff0237f8341ab86af37666aa400c9640cb266317881233c7112927b791f8c                    20.0s
 => => extracting sha256:d031a9181ade169343b9a94cbc6cd4e6647e98f64134d33e94ee8f8f7c85ed5c                     0.3s
 => => extracting sha256:82bb026e1cd969dcc9dface186bc188a104a5f5a03c6cad8ff422f6f3aa98995                    49.4s
 => [2/3] COPY ./train.py /train.py                                                                          15.9s
 => [3/3] RUN python3 train.py                                                                              406.9s

stargz-pytorch-example

 => [internal] load .dockerignore                                                                            0.0ss
 => => transferring context: 2B                                                                              0.0ss
 => [internal] load build definition from Dockerfile                                                         0.0ss
 => => transferring dockerfile: 293B                                                                         0.0ss
 => [internal] load metadata for localhost:5000/pycache-stgz:latest                                          0.0ss
 => [internal] load build context                                                                            0.0ss
 => => transferring context: 5.68kB                                                                          0.0ss
 => [1/3] FROM localhost:5000/pycache-stgz:latest@sha256:878d36dadf5fe645453793433006827170334aa454470e2efa  0.0ss
 => => resolve localhost:5000/pycache-stgz:latest@sha256:878d36dadf5fe645453793433006827170334aa454470e2efa  0.0ss
 => [2/3] COPY ./train.py /train.py                                                                          0.1ss
 => [3/3] RUN python3 train.py                                                                             420.1ss

Nov 28 '22 07:11 cutecutecat

It's awesome!

Nov 28 '22 08:11 gaocegege

envd
envd copied to clipboard

feat(image): Research nydus/stargz

Description

Restriction

Nydus

Estargz

Prefetch

traditional-pytorch-example

stargz-pytorch-example

envd envd copied to clipboard

feat(image): Research nydus/stargz

Description

Restriction

Nydus

Estargz

Prefetch

traditional-pytorch-example

stargz-pytorch-example

envd
envd copied to clipboard