overlaybd icon indicating copy to clipboard operation
overlaybd copied to clipboard

Side loading compressed layer files into the cacheDir

Open simha-db opened this issue 1 year ago • 13 comments

What happened in your environment?

is side loading compressed layer files into the cacheDir supported?

I am looking into fetching the layer files out of band and saving them in the cacheDir as sha256:<layer Digest>. Does overlaybd pickup the new files added externally to the directory after it has started? Or does it hydrate once and maintains the inventory internally until a restart?

What did you expect to happen?

I still see a few downloads happening - the logging wasn't verbose and didn't easily tell me which digest it was referring to - i will turn on debug logging and try again - but wanted to check if the above scenario is supposed to work.

How can we reproduce it?

copy the compressed layer files to the cacheDir after overlaybd has started - then start the container - expectation is overlaybd doesn't issue any blob download requests.

What is the version of your Overlaybd?

NA

What is your OS environment?

Ubuntu 18

Are you willing to submit PRs to fix it?

  • [ ] Yes, I am willing to fix it.

simha-db avatar Jun 08 '23 03:06 simha-db

Sorry i mistagged the request. It is more of a question.

simha-db avatar Jun 08 '23 04:06 simha-db

@simha-db thanks for asking!

  1. Is side loading compressed layer files into the cacheDir supported? Yes, but it has limitations. First, you should make sure the loaded files will not override existing files. Those files may have been opened by running containers. Overriding won't let overlaybd reopen these files. (downloading from source will still happen for this layer) Second, cacheDir size cacheConfig.cacheSizeGB should large enough to avoid loaded files triggering cache gc, which is unexpected behavior. So, it is highly recommanded that you stop overlaybd first, then load files to cache, and restart overlaybd.
  2. Why downloads happening ? I think it's because background download is enable in overlaybd config (download.enable). Caching and background download are seprate operations, and background download files are not stored in cache dir. If you are trying to load files into cache dir, background download can be disabled.

yuchen0cc avatar Jun 09 '23 09:06 yuchen0cc

Ah ok. When is background download triggered?If there is no prefetch involved - do any of the ondemand loads go through background download?

Some what off topic: If i use the ctr rpull --download-blobs - does that go through the cache?

simha-db avatar Jun 09 '23 13:06 simha-db

@simha-db background download starts 600s after container started by default, which can be set by download.delay. Background download is performed in a seprate thread, and on-demand load won't go through it, neither will prefetch.

ctr rpull --download-blobs download blobs to snapshotter dir, will not go through cache.

yuchen0cc avatar Jun 09 '23 14:06 yuchen0cc

Thanks - And the background downloader fetches the full image to the snapshot directory?

simha-db avatar Jun 09 '23 14:06 simha-db

@simha-db Yes, exactly. But when blobs have been downloaded by ctr rpull --download-blobs, container will start from local files, and background download will not be triggered.

yuchen0cc avatar Jun 09 '23 14:06 yuchen0cc

Do all registry requests get logged in overlaybd-audit.log?

One thing i notice is - if i prepull the entire image into registry_cache - pull image request to image service - which should be doing only rpull (without the download) takes much longer. When an image file is opened - if the layer files are in the registry_cache does the snapshotter do additional work to pull them to the snapshot directory or something of that sort?

simha-db avatar Jun 13 '23 15:06 simha-db

@simha-db all overlaybd registry requests (after rpull) will be logged in overlaybd-audit.log (background download audit is added in v0.6.9 #222). If layers have been in registry_cache, container open will not request data from registry. Snapshotter won't pull layers to snapshott directory after container started, but background download will. Please make sure background download feature is disabled before your test. Prepull should not affect time of rpull (without download), could you please show me some data ?

yuchen0cc avatar Jun 15 '23 01:06 yuchen0cc

Turns out the time is the same - i incorrectly attributed it to an increase.

simha-db avatar Jul 13 '23 16:07 simha-db

I think i have some data on the slowness - maybe it is exposed when we upgraded our overlaybd - but here's what i observe.

  1. My side loading the layer files finishes in 4 seconds.
  2. Then we call CRI pull image and i see in the containerd logs the following
Jul 25 07:01:25 ip-10-1-166-126 containerd[936]: time="2023-07-25T07:01:25.841633618Z" level=info msg="PullImage \”<removed>@sha256:a65d68704b22c9752c023cec0a2ef0868d31623574dabaf9949fcf525645b32e\""
Jul 25 07:01:25 ip-10-1-166-126 containerd[936]: time="2023-07-25T07:01:25.841757128Z" level=info msg="experimental: PullImage \”<removed>@sha256:a65d68704b22c9752c023cec0a2ef0868d31623574dabaf9949fcf525645b32e\" for runtime runc-overlaybd, using snapshotter overlaybd"
**Jul 25 07:02:10** ip-10-1-166-126 containerd[936]: time="2023-07-25T07:02:10.703589621Z" level=info msg="ImageCreate event &ImageCreate{Name:<removed>@sha256:a65d68704b22c9752c023cec0a2ef0868d31623574dabaf9949fcf525645b32e,Labels:map[string]string{io.cri-containerd.image: managed,},XXX_unrecognized:[],}"

the CRI pull image took close to 45 seconds.

  1. I look in overlaybd-auditlog - indeed it is empty - no pulls were made from network
2023/07/25 07:01:47|AUDIT|th=00007F08D4F5F300|file:read[pathname=][offset=0][size=512][latency=34292]
2023/07/25 07:02:26|AUDIT|th=00007F0743D35E80|file:read[pathname=][offset=108004587][size=2922][latency=29157]
2023/07/25 07:02:26|AUDIT|th=00007F07D35F7E80|file:read[pathname=][offset=1412525637][size=30698][latency=43467]
2023/07/25 07:02:26|AUDIT|th=00007F07D0DD7F40|file:read[pathname=][offset=1412556335][size=63151][latency=43467]
2023/07/25 07:02:27|AUDIT|th=00007F07D0DD7F40|file:read[pathname=][offset=1396719871][size=859][latency=12106]
2023/07/25 07:02:27|AUDIT|th=00007F0743D35E80|file:read[pathname=][offset=107999319][size=2456][latency=20176]
2023/07/25 07:02:27|AUDIT|th=00007F06E6BC7300|file:read[pathname=][offset=512][size=512][latency=19326]
2023/07/25 07:02:36|AUDIT|th=00007F06E4BAF2C0|file:read[pathname=][offset=213191565][size=45214][latency=12454]

  1. When i look in the overlaybd-snapshotter service logs i see the following - the last entry in the snapshotter - roughly matches the time when the image reference was returned to containerd. Seems like the snapshotter is doing something synchronously if it finds the cache files. When we don't side load - we don't see this delay. What is the snapshotter doing here? is it validating the files synchronously?
Jul 25 07:01:33 ip-10-1-166-126 overlaybd-snapshotter[1543]: time="2023-07-25T07:01:33Z" level=info msg="{\n  \"repoBlobUrl\": \”<removed>/blobs\",\n  \"lowers\": [\n    {\n      \"digest\": \"sha256:0d5554fe6fff2aa24f4065881f061f4c1b03749341e119a93174cc0858f0ed57\",\n      \"size\": 47891968,\n      \"dir\": \"/var/lib/overlaybd/snapshots/29/fs\"\n    }\n  ],\n  \"upper\": {},\n  \"resultFile\": \"/var/lib/overlaybd/snapshots/29/block/init-debug.log\"\n}"


**Jul 25 07:02:10** ip-10-1-166-126 overlaybd-snapshotter[1543]: time="2023-07-25T07:02:10Z" level=info msg="{\n  \"repoBlobUrl\": \”<removed>/blobs\",\n  \"lowers\": [\n    {\n      \"digest\": \"sha256:0d5554fe6fff2aa24f4065881f061f4c1b03749341e119a93174cc0858f0ed57\",\n      \"size\": 47891968,\n      \"dir\": \"/var/lib/overlaybd/snapshots/29/fs\"\n    },\n    {\n      \"digest\": \"sha256:82d331f6b157c3ee944d1064c55fa41f20024c34b2d7138598cec6942bc6bead\",\n      \"size\": 6144,\n      \"dir\": \"/var/lib/overlaybd/snapshots/32/fs\"\n    },\n    {\n      \"digest\": \"sha256:187a69a0ef15042fa944bc05217cf116ffc1de923bf1bbaa21241bf471324a55\",\n      \"size\": 7680,\n      \"dir\": \"/var/lib/overlaybd/snapshots/35/fs\"\n    },\n    {\n      \"digest\": \"sha256:2d0b9a2eff4cb17c414ca3cee0eb48d9e000ffd9661994997b08baa3a837d742\",\n      \"size\": 1020676608,\n      \"dir\":

simha-db avatar Jul 25 '23 07:07 simha-db

@simha-db overlaybd-snapshotter will not try to find and validate files. There is a global lock in overlaybd-snapshotter, which makes prepare/mount layer running in serial. Usually prepare is quite fast for each layer, does it happens when there are many rpull requests and prepares are queueing, or disk io load is rather high ?

yuchen0cc avatar Jul 25 '23 09:07 yuchen0cc

Something doesn't add up - when all this activity is happening in overlaybd snapshotter - like this - containerd is still waiting for the image reference - there is no lazy load happening - what is all this activity in snapshotter? It seems to touch all the layer files one by one -does it setup the snapshot directory for all the registry cache layer files before the image reference is created?

Jul 25 07:01:33 ip-10-1-166-126 overlaybd-snapshotter[1543]: time="2023-07-25T07:01:33Z" level=info msg="{\n \"repoBlobUrl\": \”<removed>/blobs\",\n \"lowers\": [\n {\n \"digest\": \"sha256:0d5554fe6fff2aa24f4065881f061f4c1b03749341e119a93174cc0858f0ed57\",\n \"size\": 47891968,\n \"dir\": \"/var/lib/overlaybd/snapshots/29/fs\"\n }\n ],\n \"upper\": {},\n \"resultFile\": \"/var/lib/overlaybd/snapshots/29/block/init-debug.log\"\n}"

simha-db avatar Jul 25 '23 14:07 simha-db

@simha-db overlaybd-snapshotter indeed prepare snapshot directory for each image layer. snapshot directory contains some meta info for overlaybd-tcmu. when rpull, containerd calls overlaybd-snapshotter to prepare snapshot for each image layer (ro layers). when run, containerd calls overlaybd-snapshotter to prepare rootfs mountpoint (rw layer), which will call overlaybd-tcmu to setup block device based on those layers.

yuchen0cc avatar Jul 26 '23 07:07 yuchen0cc