community
community copied to clipboard
Proposal: Image Acceleration(Apparate)
This proposal sounds conceptually very similar to https://github.com/containerd/stargz-snapshotter :thinking:
And very even more similar to nydus image acceleration service: https://github.com/dragonflyoss/image-service
We've been discussing with Harbor team to create a pluggable image conversion mechanism that works for different image formats (currently nydus and estargz included). Maybe Apparate can join the force as well ;)
/cc @ktock
what's the difference between https://github.com/dragonflyoss/image-service and this one?
We've been discussing with Harbor team to create a pluggable image conversion mechanism that works for different image formats (currently nydus and estargz included). Maybe Apparate can join the force as well ;)
:+1:
Recently a variety of image formats are discussed in the community (e.g. nydus, estargz, zstd:chunked...) not only Apparate, so it would be great to have a generic (and pluggable) conversion mechanism that works for them.
A pluggable image conversion mechanism has also been proposed here: https://github.com/goharbor/community/pull/167 We can participate in the discussion together. :)
It seems like another image-service (https://github.com/dragonflyoss/image-service), and another stargz (https://github.com/containerd/stargz-snapshotter). I think Apparate, image-service and some other new image formats are based on or the extension of stargz for they look quite similar. It is better to make stargz as a standard and other implementations keep compatible with stargz and develop their own features.
@lovecontainers Standardization of lazy pulling in the current version of OCI Image Spec (v1) is discussed in https://github.com/opencontainers/image-spec/issues/815. nydus is proposed to the next version of OCI Image Spec (a.k.a. OCIv2). c.f. https://www.cncf.io/blog/2020/10/20/introducing-nydus-dragonfly-container-image-service/
@ktock yeah, I hope for the next oci spec. but nydus looks quite similar to stargz as it illustrated in the doc that nydus is a improvement of stargz. In fact, almost all newer remote image formats looks the same. So I think maybe is a better way to bring up the stargz v2 rather than so many stargz liked ones. At this moment, widely disscussion is necessary, but repeated ones are meaningless.
@lovecontainers Yes, repeated ones are meaningless. And there's a novel solution open-sourced recently: https://github.com/alibaba/overlaybd https://github.com/alibaba/accelerated-container-image https://www.usenix.org/conference/atc20/presentation/li-huiba https://www.usenix.org/conference/atc21/presentation/wang-ao
@lihuiba thank u, this is my first time learned about overlaybd for I am a beginner of containers. it looks like traditional vm image and native friendly to remote access. The most interesting point for me is your implementation deos not depends on FUSE.
It seems like another image-service (https://github.com/dragonflyoss/image-service), and another stargz (https://github.com/containerd/stargz-snapshotter). I think Apparate, image-service and some other new image formats are based on or the extension of stargz for they look quite similar. It is better to make stargz as a standard and other implementations keep compatible with stargz and develop their own features.
There's a fundamental difference between stargz and nydus:) Nydus could be thought as a file system over object storage and has a split fs metadata/data design, so different images could share data blob objects.
@malc0lm Pls subscribe and discuss here, we need to answer questions from the community.
It seems like another image-service (https://github.com/dragonflyoss/image-service), and another stargz (https://github.com/containerd/stargz-snapshotter). I think Apparate, image-service and some other new image formats are based on or the extension of stargz for they look quite similar. It is better to make stargz as a standard and other implementations keep compatible with stargz and develop their own features.
There's a fundamental differen between stargz and nydus:) Nydus could be thought as a file system over object storage and has a split fs metadata/data design, so different images could share data blob objects.
it is really a great improvement. it is hard to say a fundamental difference, and also the Apparatus. these similar propsosals may have competitions for business, for they stand for different companies, but make no sense for community reaching an agreement of next oci .
@lovecontainers @tianon Overlaybd is a combination of container image and VM image. It is a layered image in form of block device. It doesn't depends on FUSE / virtio-fs. I believe this design gathers the best of both worlds (container and VM), and it is applicable to both worlds.
@lovecontainers @tianon Overlaybd is a combination of container image and VM image. It is a layered image in form of block device. It doesn't depends on FUSE / virtio-fs. I believe this design gathers the best of both worlds (container and VM), and it is applicable to both worlds.
interesting,good for you, you are so funny
@lovecontainers @tianon Overlaybd is a combination of container image and VM image. It is a layered image in form of block device. It doesn't depends on FUSE / virtio-fs. I believe this design gathers the best of both worlds (container and VM), and it is applicable to both worlds.
obviously filesystem has higher abstract level than block device, which means, more business value can be added on top of it, and I don’t understand what makes you think this overlaybd thing is best of the world because it is not depend on fuse and virtiofs. But you depend on TCM which is another ko, so what is the advantage? You are welcome if you identify the pros and cons of different approach, instead you keep saying you are the best and others are meaningless which make me feel disgusting.
@xujihui1985 Hi, jihui. "It doesn't depends on FUSE / virtio-fs" is just a statement of fact, and a confirmation to lovecontainers. The reasons why I believe overlaybd is the best is complicated, and I suggest you read the papers above mentioned. There are paragraphs discussing this topic. Thanks!
@xujihui1985 Higher abstraction level doesn't necessarily mean better solution. For example, Python is a higher-level language than Java or C/C++, but Python is not necessarily better in every aspect. The best (-fit) abstractions vary in difference scenarios. The abstraction of block device doesn't preclude a file system abstraction on top of it. Actually, we have made an internal solution that includes an enhanced file system, called rofs, atop overlaybd. This solution unleashes all the imaginations about the file system abstraction, while retaining the advantages of block device, i.g. simplicity and efficiency.
@xujihui1985 Higher abstraction level doesn't necessarily mean better solution. For example, Python is a higher-level language than Java or C/C++, but Python is not necessarily better in every aspect. The best (-fit) abstractions vary in difference scenarios. The abstraction of block device doesn't preclude a file system abstraction on top of it. Actually, we have made an internal solution that includes an enhanced file system, called rofs, atop overlaybd. This solution unleashes all the imaginations about the file system abstraction, while retaining the advantages of block device, i.g. simplicity and efficiency.
@lihuiba I don't get this metaphor, what's the matter with python? 😂 and I'm pleased to know you are working on a solution of filesystem. welcome to join the force. :)
@lovecontainers @tianon Overlaybd is a combination of container image and VM image. It is a layered image in form of block device. It doesn't depends on FUSE / virtio-fs. I believe this design gathers the best of both worlds (container and VM), and it is applicable to both worlds.
obviously filesystem has higher abstract level than block device, which means, more business value can be added on top of it, and I don’t understand what makes you think this overlaybd thing is best of the world because it is not depend on fuse and virtiofs. But you depend on TCM which is another ko, so what is the advantage? You are welcome if you identify the pros and cons of different approach, instead you keep saying you are the best and others are meaningless which make me feel disgusting.
@xujihui1985 I did some research on the basis of stargz, and really felt the bottleneck of FUSE, in both performance and stability. Did FUSE have any alternatives? or does nydus has some improvements on that ( no related statement found in nydus docs)?
@lovecontainers My team is also trying to improve fuse's performance, and we have an up-coming paper on this topic: https://www.usenix.org/conference/atc21/presentation/hsu .
But there's one more thing to solve: failure recovery. If fuse server process crashes, or gets killed, the file system instance may not recovery.
These problems (perforamce, fault-tolerance, etc.) do not exist in overlaybd.
@lovecontainers @tianon Overlaybd is a combination of container image and VM image. It is a layered image in form of block device. It doesn't depends on FUSE / virtio-fs. I believe this design gathers the best of both worlds (container and VM), and it is applicable to both worlds.
obviously filesystem has higher abstract level than block device, which means, more business value can be added on top of it, and I don’t understand what makes you think this overlaybd thing is best of the world because it is not depend on fuse and virtiofs. But you depend on TCM which is another ko, so what is the advantage? You are welcome if you identify the pros and cons of different approach, instead you keep saying you are the best and others are meaningless which make me feel disgusting.
@xujihui1985 I did some research on the basis of stargz, and really felt the bottleneck of FUSE, in both performance and stability. Did FUSE have any alternatives? or does nydus has some improvements on that ( no related statement found in nydus docs)?
At early stage of developing fs based image acceleration technologies, FUSE is a good choice. When the technology becomes mature, an in kernel read only fs may be better solution. And nydus aims to become an in kernel fs:)
@lovecontainers @tianon Overlaybd is a combination of container image and VM image. It is a layered image in form of block device. It doesn't depends on FUSE / virtio-fs. I believe this design gathers the best of both worlds (container and VM), and it is applicable to both worlds.
obviously filesystem has higher abstract level than block device, which means, more business value can be added on top of it, and I don’t understand what makes you think this overlaybd thing is best of the world because it is not depend on fuse and virtiofs. But you depend on TCM which is another ko, so what is the advantage? You are welcome if you identify the pros and cons of different approach, instead you keep saying you are the best and others are meaningless which make me feel disgusting.
@xujihui1985 I did some research on the basis of stargz, and really felt the bottleneck of FUSE, in both performance and stability. Did FUSE have any alternatives? or does nydus has some improvements on that ( no related statement found in nydus docs)?
@lovecontainers FUSE is not the problem of bottleneck, the problem is how to use fuse, the pros of stargz is the compatibility with targz, this is realy good one, the problem IMO is
- each layers of stargz image will mount as a fuse mountpoint, and these layers then combine to overlayfs.
- the toc must be fully load into rss for index inode, even the inode may never been read, that cause high memory footprint.
What nydus does to improve is to do "overlay" in build stage, and build the final view of root fs in metadata, so that one fuse mountpoint per image, underlying blob file is shared. Instead of loading entire toc index into rss memory, nydus build a inode table in the header of metadata, so only a small portion of memory is needed during the startup, you can refer to the detailed design doc here https://github.com/dragonflyoss/image-service/blob/master/docs/nydus-design.md
@lovecontainers @tianon Overlaybd is a combination of container image and VM image. It is a layered image in form of block device. It doesn't depends on FUSE / virtio-fs. I believe this design gathers the best of both worlds (container and VM), and it is applicable to both worlds.
obviously filesystem has higher abstract level than block device, which means, more business value can be added on top of it, and I don’t understand what makes you think this overlaybd thing is best of the world because it is not depend on fuse and virtiofs. But you depend on TCM which is another ko, so what is the advantage? You are welcome if you identify the pros and cons of different approach, instead you keep saying you are the best and others are meaningless which make me feel disgusting.
@xujihui1985 I did some research on the basis of stargz, and really felt the bottleneck of FUSE, in both performance and stability. Did FUSE have any alternatives? or does nydus has some improvements on that ( no related statement found in nydus docs)?
@lovecontainers FUSE is not the problem of bottleneck, the problem is how to use fuse, the pros of stargz is the compatibility with targz, this is realy good one, the problem IMO is
- each layers of stargz image will mount as a fuse mountpoint, and these layers then combine to overlayfs.
- the toc must be fully load into rss for index inode, even the inode may never been read, that cause high memory footprint.
What nydus does to improve is to do "overlay" in build stage, and build the final view of root fs in metadata, so that one fuse mountpoint per image, underlying blob file is shared. Instead of loading entire toc index into rss memory, nydus build a inode table in the header of metadata, so only a small portion of memory is needed during the startup, you can refer to the detailed design doc here https://github.com/dragonflyoss/image-service/blob/master/docs/nydus-design.md
yeah, I have already tried something similar to your solutions. thank you.
@kofj where is the git repository of Apparate? I am curious about Apparate's solution of recovering fuse process :)
An important goal of this proposal is to create a vendor-neutral sub-project in the goharbor community.
@lovecontainers Sorry, there is no Apparate repository in github currently. Recovering fuse process is core ability for Apparate. First, fuse in userspace and kernel fuse module use /dev/fuse fd to communitcate, so it must separate fuse process and holding fd process. And we also need fuse request tracing in case of io hang in recovering. Finally, in read/write fuse filesystem, we also need record opened fd.
@lovecontainers Sorry, there is no Apparate repository in github currently. Recovering fuse process is core ability for Apparate. First, fuse in userspace and kernel fuse module use /dev/fuse fd to communitcate, so it must separate fuse process and holding fd process. And we also need fuse request tracing in case of io hang in recovering. Finally, in read/write fuse filesystem, we also need record opened fd.
looking forward to see your implementation on github
repo is here: https://github.com/goharbor/acceleration-service
slack channel https://cloud-native.slack.com/archives/C01U31AK2LX