nomad Get rid of docker pause containers with a custom runtime. Closes #15086

Feb 20 '24 20:02 apollo13

Hi @tgross, I see you added a needs-rebase label. I'd happily rebase and work with you on that if there is any consensus on how to move this forward. Personally I think it would be a massive stability win for nomad.

Jul 18 '24 20:07 apollo13

Hi @tgross, I see you added a needs-rebase label. I'd happily rebase and work with you on that if there is any consensus on how to move this forward. Personally I think it would be a massive stability win for nomad.

Ah yeah... I marked this (and all other open PRs at the time) as needs-rebase just because of a CI change around backports and our new LTS workflow that wouldn't work on any open PR that wasn't rebased on main after those changes landed. Sorry, I should have posted a note too. :grinning:

As far as this proposal goes, I really like the idea of dropping the pause containers but I'm still fuzzy on whether this particular implementation is viable (especially with having to deal with the group-level networks). There's not really enough for me to go on here and unfortunately I haven't had time to dig in further to make sure we understand all the implications of the design.

Jul 19 '24 13:07 tgross

@tgross Any chance that the team looks at this and we end up with some sort of plan? Even if this is not supported for nvidia runtimes, it would be a massive win for everyone else.

Oct 29 '24 09:10 apollo13

As I mentioned above, I'm still a little fuzzy as to the viability of this plan. The PR doesn't have a working implementation in place, so it's hard to reason about without going thru it from scratch. Unfortunately we haven't had time to do so.

Oct 29 '24 12:10 tgross

Mkay, combined with https://github.com/hashicorp/nomad/issues/15086#issuecomment-1954863893 it is a working implementation (or at least was last time I checked). I rather not put work into it if there is no chance on getting this in, so I also would like to keep the python script for now and not rewrite it into go :)

Oct 29 '24 12:10 apollo13

@tgross this PR now contains a fully working implementation. Start nomad with this config:

plugin "docker" {
  config {
    new_networking = true
  }
}

and configure docker like this (/etc/docker/daemon.json):

{
	"runtimes": {"nomad": {"path": "/home/florian/sources/nomad/bin/nomad", "runtimeArgs": ["runc"]}}
}

Adjust the path to the runtime executable as needed. Let me know what you think!

Nov 04 '24 22:11 apollo13

Thanks @apollo13! I'll take a detailed look thru in the next couple days.

Nov 05 '24 19:11 tgross

@tgross Thanks, no rush though. I probably cannot finish it anyways this year. That said I think it would be really valuable to have and maybe it can act as a starting point. Don't hesitate to ask if anything is unclear though!

Nov 05 '24 19:11 apollo13

If you can't get back to it because of outside forces (totally understandable), just let us know and we'll look into carrying the PR forward (with credit to you, of course!).

Will see what I can do, but will most likely have to be in my free time since at work we are probably moving to k8s after the recent license changes and the general move to hide more and more behind the enterprise license (and other stuff like good CNI plugins with network policy basically not being existing for nomad). Would love you folks to talk me out of it though :þ

What timeframe are you thinking about for getting that in? If you want it this year, it might be easier if you continue it. If you are not in a rush, then I might be able to get something done, no promises.

Nov 08 '24 17:11 apollo13

What timeframe are you thinking about for getting that in? If you want it this year, it might be easier if you continue it. If you are not in a rush, then I might be able to get something done, no promises.

No real rush. As you might imagine things slow down a bit going into the end of the year anyways. Our next major release 1.10 LTS isn't until the spring, but this seems like the sort of thing we could land in a minor release no problem.

Nov 08 '24 20:11 tgross

Found some time today to clean up the PR, made the cmd hidden and renamed it to runcshim. Now you also need to pass the "next runc" binary via runtimeArgs in the docker daemon.json:

{
	"runtimes": {"nomad": {"path": "/home/florian/sources/nomad/bin/nomad", "runtimeArgs": ["runcshim", "runc"]}}
}

Nov 09 '24 14:11 apollo13

So I have a deal for you @tgross. Would you mind taking this over and writing the tests? I'll adjust everything else you want (or you do it by your own if you are faster) but getting this tested will take me a fair amount of time to even get a basic test setup running (I'd happily improve them though -- assuming time permits it -- if you can show me an example of how to write those -- ie how should docker daemon.json get configured etc).

Nov 09 '24 14:11 apollo13

So I have a deal for you @tgross. Would you mind taking this over and writing the tests?

Sure thing, no problem. I'll look to getting it landed sometime in this major cycle.

Nov 11 '24 15:11 tgross

I haven't forgotten this and the issue is still mocking my attempts at inbox zero. Obviously we missed 1.10.0 for this with all the recent excitement in our org. But I'm going to chat with @arodd sometime next week about seeing if we can find a place for this in the roadmap.

Apr 10 '25 21:04 tgross

I haven't forgotten this and the issue is still mocking my attempts at inbox zero. Obviously we missed 1.10.0 for this with all the recent excitement in our org. But I'm going to chat with @arodd sometime next week about seeing if we can find a place for this in the roadmap.

Did any clarity ensue on this since?

May 28 '25 01:05 3nprob