envd
envd copied to clipboard
feat(image): Research nydus/stargz
Description
stargz/nydus can accelerate the image load process on Kubernetes. Let's investigate how to integrate and the benefits to AI/ML use case.
Message from the maintainers:
Love this enhancement proposal? Give it a 👍. We prioritise the proposals with the most 👍.
In production enviroment, i rarely see images below 20G, some user even put the data in image. I think image accelerate is highlight and practical solutions to user problems.
Yep, I think so.
filename: usr/local/lib/python3.8/dist-packages/wrapt-1.14.1.dist-info/top_level.txt, offset: 1046700032, size: 6
filename: usr/local/lib/python3.8/dist-packages/zipp/, offset: 1046701056, size: 0
filename: usr/local/lib/python3.8/dist-packages/zipp/__init__.py, offset: 1046701568, size: 8659
filename: usr/local/lib/python3.8/dist-packages/zipp/__pycache__/, offset: 1046710784, size: 0
filename: usr/local/lib/python3.8/dist-packages/zipp/__pycache__/__init__.cpython-38.pyc, offset: 1046711296, size: 10762
filename: usr/local/lib/python3.8/dist-packages/zipp/__pycache__/py310compat.cpython-38.pyc, offset: 1046723072, size: 406
filename: usr/local/lib/python3.8/dist-packages/zipp/py310compat.py, offset: 1046724096, size: 309
filename: usr/local/lib/python3.8/dist-packages/zipp-3.10.0.dist-info/, offset: 1046725120, size: 0
filename: usr/local/lib/python3.8/dist-packages/zipp-3.10.0.dist-info/INSTALLER, offset: 1046725632, size: 4
filename: usr/local/lib/python3.8/dist-packages/zipp-3.10.0.dist-info/LICENSE, offset: 1046726656, size: 1050
filename: usr/local/lib/python3.8/dist-packages/zipp-3.10.0.dist-info/METADATA, offset: 1046728704, size: 3672
filename: usr/local/lib/python3.8/dist-packages/zipp-3.10.0.dist-info/RECORD, offset: 1046733312, size: 707
filename: usr/local/lib/python3.8/dist-packages/zipp-3.10.0.dist-info/WHEEL, offset: 1046734848, size: 92
filename: usr/local/lib/python3.8/dist-packages/zipp-3.10.0.dist-info/top_level.txt, offset: 1046735872, size: 5
The index generated by https://github.com/awslabs/soci-snapshotter
SOCI addresses these issues by loading from the original, unmodified OCI image. Instead of converting the image, it builds a separate index artifact (the "SOCI index"), which lives in the remote registry, right next to the image itself. At container launch time, SOCI Snapshotter queries the registry for the presence of the SOCI index using the mechanism developed by the OCI Reference Types working group.
About what the snapshot is:
https://github.com/containerd/containerd/blob/main/docs/content-flow.md
- https://github.com/moby/buildkit/blob/master/docs/stargz-estargz.md
- https://github.com/moby/buildkit/blob/master/docs/nydus.md
buildkit can build images with nydus/estargz formats.
Difference of stargz/nydus:
- https://github.com/dragonflyoss/image-service/issues/50
Design report:
- estargz: https://github.com/containerd/stargz-snapshotter/blob/main/docs/estargz.md
- nydus: https://github.com/dragonflyoss/image-service/blob/master/docs/nydus-design.md
Pros & Cons:
From my perspective, nydus
might be faster, with lower CPU load, but need to introduce an standalone executable file. While estargz is more compatible with buildkit.
There seems some difference in image format of them either, need to do more research.
@cutecutecat Are you interested in this? You can pick it up. And I'd appreciate it.
@cutecutecat Are you interested in this? You can pick it up. And I'd appreciate it.
@gaocegege Yes, I would like to pick it. Are there anything else that need to investigate?
As we have known buildkit can built both of them, I think I could build a large image by buildctl
and test the time cost and image size?
- #1086
- #51
Restriction
Nydus
Nydus is conflict with --export-cache
and --import-cache
, is this acceptable in envd? @gaocegege
I think it might be not.
Since exported Nydus image will always have one more metadata layer than images in other compression types, Nydus image cannot be exported/imported as cache.
ref: https://github.com/moby/buildkit/blob/master/docs/nydus.md and https://github.com/moby/buildkit/pull/2581
Estargz
Rootless execution is currently unsupported.
It seems more acceptable. The cache python
, R
and julia
don't work with rootless
now, but we should be careful if we would support rootless
of cache in the future.
If we pick Estargz
, we should keep a configurable item for image format, instead of substitude origin image format.
Prefetch
Estargz and Nydus support prefetch. This can be used to mitigate runtime performance drawbacks caused by the on-demand fetching of each file.
Maybe we could use https://github.com/docker-slim/docker-slim to do a scan of some typical ML training case, in order to pick which file is hotspot and need to be prefetched.
I think we can verify the benifits of theses tools. For example, we can run a shell in tensorflow. And see the startup time.
golang:1.18-alpine
is used to build and run a simple hello.go
to test Golang building cost of stargz.
mskwyditd/pytorch-cuda-python3.1
is used to run a simple train.go
to test Python building cost of stargz. As docker.io
is too slow for a 16G
image, I deploy an localhost registry by:
# limit registry pull speed to 200mbps
sudo docker run -d \
--name docker-tc \
--network host \
--cap-add NET_ADMIN \
--restart always \
-v /var/run/docker.sock:/var/run/docker.sock \
-v /var/docker-tc:/var/docker-tc \
lukaszlach/docker-tc
sudo docker network create test-net
sudo docker run --net test-net --label "com.docker-tc.limit=200mbps" -d -p 5000:5000 --restart=always --name registry registry:2
The hello.go
is simple:
package main
import "fmt"
func main() {
fmt.Println("Hello, world!")
}
The train.py
is use CNN to predict mnist, source from file.
nerdctl is used to pull, convert and build the image.
sudo nerdctl image pull mskwyditd/pytorch-cuda-python3.10:latest
sudo nerdctl image convert --estargz --oci starkind/stargz-examples:pycache starkind/stargz-examples:pycache-stgz
sudo time -o first.txt buildctl build --frontend dockerfile.v0 \
--no-cache \
--local context=. \
--local dockerfile=. \
Image | size | source | stargz | File | Pull / s | Run first time / s |
---|---|---|---|---|---|---|
golang:1.18-alpine | 113.35 M | docker.io | hello.go | 96.4 | 1.37 | |
golang:1.18-alpine | 117.65 M | docker.io | ✅ | hello.go | / | 37.8 |
mskwyditd/pytorch-cuda-python3.10 | 16.2G | localhost | train.py | 91.0 | 406.9 | |
mskwyditd/pytorch-cuda-python3.10 | 16.3G | localhost | ✅ | train.py | / | 420.1 |
traditional-pytorch-example
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 376B 0.0s
=> [internal] load metadata for localhost:5000/pycache:latest 0.0s
=> [internal] load build context 0.0s
=> => transferring context: 5.68kB 0.0s
=> [1/3] FROM localhost:5000/pycache:latest@sha256:b5d0f6ea5ace68790c08cf17201eaa5998ecf53087b9ac57b392841 91.0s
=> => resolve localhost:5000/pycache:latest@sha256:b5d0f6ea5ace68790c08cf17201eaa5998ecf53087b9ac57b3928415 0.0s
=> => sha256:d031a9181ade169343b9a94cbc6cd4e6647e98f64134d33e94ee8f8f7c85ed5c 86.90kB / 86.90kB 0.0s
=> => sha256:29f6e52f2e6080c637928592798904ecedb31e4079c07748edb7376ebbd2e398 63.10kB / 63.10kB 0.0s
=> => sha256:64129b569154cf5afcea88d65c1657a84d9961b7aaf086bd2fe2f2e3ed2fcad8 6.43kB / 6.43kB 0.0s
=> => sha256:1362a29ff46515e1f117f2bebd093ce13af97bb1a6f27171abc4990dbee4a435 186B / 186B 0.0s
=> => sha256:82bb026e1cd969dcc9dface186bc188a104a5f5a03c6cad8ff422f6f3aa98995 7.26GB / 7.26GB 41.1s
=> => sha256:813ff0237f8341ab86af37666aa400c9640cb266317881233c7112927b791f8c 1.60GB / 1.60GB 10.3s
=> => sha256:19e4169ce7d724dbcc1a6f5bf9e5dc21a05a6983173f3522c106bfb4994d07a5 1.18GB / 1.18GB 6.8s
=> => sha256:ccd8058ddd7517692e482566c35645f0bfdd75354260d9ea207de5c699564bee 56.23MB / 56.23MB 0.3s
=> => sha256:58710bbb48677cfcf4bed3cdd3cbb56f040f85e1b4fc8df8a2715d7760b45c67 4.60MB / 4.60MB 0.0s
=> => sha256:cf92e523b49ea3d1fae59f5f082437a5f96c244fda6697995920142ff31d59cf 30.43MB / 30.43MB 0.2s
=> => extracting sha256:cf92e523b49ea3d1fae59f5f082437a5f96c244fda6697995920142ff31d59cf 0.6s
=> => extracting sha256:58710bbb48677cfcf4bed3cdd3cbb56f040f85e1b4fc8df8a2715d7760b45c67 0.1s
=> => extracting sha256:ccd8058ddd7517692e482566c35645f0bfdd75354260d9ea207de5c699564bee 0.8s
=> => extracting sha256:1362a29ff46515e1f117f2bebd093ce13af97bb1a6f27171abc4990dbee4a435 0.0s
=> => extracting sha256:64129b569154cf5afcea88d65c1657a84d9961b7aaf086bd2fe2f2e3ed2fcad8 0.0s
=> => extracting sha256:19e4169ce7d724dbcc1a6f5bf9e5dc21a05a6983173f3522c106bfb4994d07a5 11.0s
=> => extracting sha256:29f6e52f2e6080c637928592798904ecedb31e4079c07748edb7376ebbd2e398 0.0s
=> => extracting sha256:813ff0237f8341ab86af37666aa400c9640cb266317881233c7112927b791f8c 20.0s
=> => extracting sha256:d031a9181ade169343b9a94cbc6cd4e6647e98f64134d33e94ee8f8f7c85ed5c 0.3s
=> => extracting sha256:82bb026e1cd969dcc9dface186bc188a104a5f5a03c6cad8ff422f6f3aa98995 49.4s
=> [2/3] COPY ./train.py /train.py 15.9s
=> [3/3] RUN python3 train.py 406.9s
stargz-pytorch-example
=> [internal] load .dockerignore 0.0ss
=> => transferring context: 2B 0.0ss
=> [internal] load build definition from Dockerfile 0.0ss
=> => transferring dockerfile: 293B 0.0ss
=> [internal] load metadata for localhost:5000/pycache-stgz:latest 0.0ss
=> [internal] load build context 0.0ss
=> => transferring context: 5.68kB 0.0ss
=> [1/3] FROM localhost:5000/pycache-stgz:latest@sha256:878d36dadf5fe645453793433006827170334aa454470e2efa 0.0ss
=> => resolve localhost:5000/pycache-stgz:latest@sha256:878d36dadf5fe645453793433006827170334aa454470e2efa 0.0ss
=> [2/3] COPY ./train.py /train.py 0.1ss
=> [3/3] RUN python3 train.py 420.1ss
It's awesome!