synpse icon indicating copy to clipboard operation
synpse copied to clipboard

AGENT_IMAGE_GC_AGE not enforced.

Open hrfuller opened this issue 2 years ago • 8 comments

Based on some conversations on discord I have setup the following environment variables on the synpse-agent service on a host.

Environment=AGENT_IMAGE_GC_AGE="48h"
Environment=AGENT_IMAGE_GC_FORCE="true"

But I see images that are much older than 48 hours on the host. This is a bit of a pain point because as we deploy new images we have to manually prune the docker system on our hosts. Is there something obvious I'm missing about how to setup the image garbage collection?

The agent version of the host is 0.21.18 The docker version info is:

Client:
 Version:           20.10.12
 API version:       1.41
 Go version:        go1.16.2
 Git commit:        20.10.12-0ubuntu2~20.04.1
 Built:             Wed Apr  6 02:16:12 2022
 OS/Arch:           linux/arm64
 Context:           default
 Experimental:      true

Server:
 Engine:
  Version:          20.10.12
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.16.2
  Git commit:       20.10.12-0ubuntu2~20.04.1
  Built:            Thu Feb 10 15:03:35 2022
  OS/Arch:          linux/arm64
  Experimental:     false
 containerd:
  Version:          1.5.9-0ubuntu1~20.04.4
  GitCommit:
 nvidia:
  Version:          1.1.0-0ubuntu1~20.04.1
  GitCommit:        629a689
 docker-init:
  Version:          0.19.0
  GitCommit:

cc @oezdemir @emersonknapp

hrfuller avatar Oct 13 '23 20:10 hrfuller

Hey @hrfuller. I was under the impression we fixed this. Can you please provide from the node you running this below:

uname -a
uptime
docker inspect <image_no_purged>

Are your devices being restarted in that 48h window or they are powered on all the time?

mjudeikis avatar Oct 14 '23 07:10 mjudeikis

I was under the impression we fixed this.

The fix does seem to work on test hosts that remain on during the 48h window. But most of our hosts are edge devices that are powered on and off frequently. It seems like that would explain the lack of enforcement.

Are your devices being restarted in that 48h window?

Yes they are. Is there anyway you can use the image age information from docker to do the GC?

hrfuller avatar Oct 17 '23 17:10 hrfuller

So this is bit different usecase. I had an idea how we can try to mitigate this. for now Docker when image is created does not have any timestamp. Only metadata is "image inception date" but not when it was created on the host.

We could inject metadata into labels and try pruning based on that.

Bare with me few days until I can try this and ship something for testing.

mjudeikis avatar Oct 17 '23 17:10 mjudeikis

Thanks! It seems like the docker daemon knows how old the images are based on something when you run docker images but I suspect that is the inception date you're talking about. Any solution would be very welcome.

hrfuller avatar Oct 17 '23 17:10 hrfuller

Yes, you should see dates which are non-realistic. Just a question, does setting something like

Environment=AGENT_IMAGE_GC_AGE="2h"
Environment=AGENT_IMAGE_GC_FORCE="true"

Where it would purge unused images each 2 hours does not work?

mjudeikis avatar Oct 18 '23 14:10 mjudeikis

Where it would purge unused images each 2 hours does not work?

I believe it does work but I will try it out.

hrfuller avatar Oct 18 '23 22:10 hrfuller

Let me know. I suspect code might be very tricky, so solving with something like this would be easier

mjudeikis avatar Oct 21 '23 13:10 mjudeikis

Following up @mjudeikis . Tried using this

ExecStart=/usr/local/bin/synpse-agent run
Environment=AGENT_IMAGE_GC_AGE="30m"
Environment=AGENT_IMAGE_GC_FORCE="true"

Doesn't seem to work. The machine definitely stays on longer than 30 minutes at a time. Any ideas what to try next?

hrfuller avatar Dec 11 '23 21:12 hrfuller