kamal icon indicating copy to clipboard operation
kamal copied to clipboard

Role aware container names

Open tbuehlmann opened this issue 1 year ago • 1 comments

Container names don't include their role (web, job, …), so there are naming collisions when the same service tries to run with different roles on the same host. See https://github.com/mrsked/mrsk/issues/44.

This doesn't work right now:

# config/deploy.yml
service: app

servers:
  web:
    - 1.1.1.1
  job:
    hosts:
      - 1.1.1.1
    cmd: bin/jobs

… as the container name would be app-<version> for web and job.

This PR solves this by adding the role to container names. The containers for the above config would be named app-web-<version> and app-job-<version>.

It also makes changes to --hosts and --roles behaviour as to how I understand it should work:

  • --hosts filters the hosts for the given command and sets MRKS's @specific_hosts
  • --roles filters the roles for the given command and sets MRKS's @specific_roles
  • MRSK.hosts takes the specific hosts (or all hosts) and filters by the specific roles
  • MRSK.roles takes the specific roles (or all roles) and filters by the specific hosts
  • Overriding hosts to values other than those mentioned in the config doesn't work anymore. Not sure this is correct but it didn't make sense to me.

Note: There's no backwards compatibility. Probably best to stop and remove all running app containers, then update mrsk, then redeploy.

TODOs:

  • [x] Rolify Mrsk::Commands::Accessory (nothing todo)
  • [x] Rolify Mrsk::Commands::App
  • [x] Rolify Mrsk::Commands::Auditor
  • [x] Rolify Mrsk::Commands::Healthcheck (nothing todo)
  • [x] Rolify Mrsk::Commands::Prune (nothing todo)
  • [x] Rolify Mrsk::Commands::Registry (nothing todo)
  • [x] Rolify Mrsk::Commands::Traefik (nothing todo)
  • [x] Non-web containers shouldn't expose 3000 so traefik doesn't get confused

Non-web containers shouldn't expose 3000 so traefik doesn't get confused

When testing on a remote server, I had the issue that web (puma) and job (sidekiq) would both be recognized by traefik while only the web container would respond to http requests and somehow no requests would go through. I initially thought exposing port 3000 would be the issue but it's the traefik labels. Moving them to only the web role resolves the issue.

tbuehlmann avatar Mar 10 '23 07:03 tbuehlmann

Excellent work on this! Big change, so might take a couple of days before I'm able to fully review.

dhh avatar Mar 13 '23 18:03 dhh