kamal icon indicating copy to clipboard operation
kamal copied to clipboard

kamal deploy solid_queue without needing to deploy a web app beside it (for kamal-proxy healthcheck reasons)

Open 34code opened this issue 1 year ago • 1 comments

I would love to dockerize solid_queue and deploy it on my server.

I already have a custom dockerfile (Dockerfile.worker) with the following:

# syntax = docker/dockerfile:1

# This Dockerfile is designed for production, not development. Use with Kamal or build'n'run by hand:
# docker build -t my-app .
# docker run -d -p 80:80 -p 443:443 --name my-app -e RAILS_MASTER_KEY=<value from config/master.key> my-app

# Make sure RUBY_VERSION matches the Ruby version in .ruby-version
ARG RUBY_VERSION=3.3.4
FROM docker.io/library/ruby:$RUBY_VERSION-slim AS base

# Rails app lives here
WORKDIR /rails

# Install base packages
RUN apt-get update -qq && \
    apt-get install --no-install-recommends -y curl libjemalloc2 libvips postgresql-client && \
    rm -rf /var/lib/apt/lists /var/cache/apt/archives

# Set production environment
ENV RAILS_ENV="production" \
    BUNDLE_DEPLOYMENT="1" \
    BUNDLE_PATH="/usr/local/bundle" \
    BUNDLE_WITHOUT="development"

# Throw-away build stage to reduce size of final image
FROM base AS build

# Install packages needed to build gems
RUN apt-get update -qq && \
    apt-get install --no-install-recommends -y build-essential git libpq-dev pkg-config && \
    rm -rf /var/lib/apt/lists /var/cache/apt/archives

# Install application gems
COPY Gemfile Gemfile.lock ./
RUN bundle install && \
    rm -rf ~/.bundle/ "${BUNDLE_PATH}"/ruby/*/cache "${BUNDLE_PATH}"/ruby/*/bundler/gems/*/.git && \
    bundle exec bootsnap precompile --gemfile

# Copy application code
COPY . .

# Precompile bootsnap code for faster boot times
RUN bundle exec bootsnap precompile app/ lib/

# Precompiling assets for production without requiring secret RAILS_MASTER_KEY
RUN SECRET_KEY_BASE_DUMMY=1 ./bin/rails assets:precompile

# Final stage for app image
FROM base

# Copy built artifacts: gems, application
COPY --from=build "${BUNDLE_PATH}" "${BUNDLE_PATH}"
COPY --from=build /rails /rails

# Run and own only the runtime files as a non-root user for security
RUN groupadd --system --gid 1000 rails && \
    useradd rails --uid 1000 --gid 1000 --create-home --shell /bin/bash && \
    chown -R rails:rails db log storage tmp
USER 1000:1000

# Entrypoint prepares the database.
ENTRYPOINT ["/rails/bin/docker-entrypoint"]

# Start the server by default, this can be overwritten at runtime
EXPOSE 3000
CMD bundle exec foreman start -f Procfile.jobs

but in my Procfile.jobs I need to do the following

jobs: bin/jobs
web_healthcheck: env RUBY_DEBUG_OPEN=true bundle exec thrust bin/rails server -p 3000

Any way to remove the web_healthcheck part in my Procfile (consumes resources)?

Wondering what the best way to deploy only "bin/jobs" in my Procfile without needing a rails app beside it for healthcheck reasons? I'm wondering if I can do without the "proxy" part in kamal entirely..

34code avatar Oct 16 '24 02:10 34code

Hi @34code! Do you mean that you want to avoid running kamal-proxy? If so https://github.com/basecamp/kamal/issues/1083#issuecomment-2429023635, should be what you need.

djmb avatar Oct 22 '24 11:10 djmb

perfection!

https://tenor.com/view/perfection-michael-fassbender-steve-jobs-movie-gif-16929303

34code avatar Oct 23 '24 06:10 34code

somehow setting "proxy: false" still triggers healthcheck...

34code avatar Nov 01 '24 17:11 34code

nvm.. I think i need to provide a reasonable "readiness_delay: N" and not have the rails app run inside the Procfile.. will test and close this issue asap.

34code avatar Nov 01 '24 17:11 34code

Setting a long "readiness_delay" also doesn't fix the problem.. The only working solution is to have the rails app side by side in the Procfile for now...

34code avatar Nov 01 '24 17:11 34code

@34code - readiness_delay is how long we wait after the container reaches the running state. It's a safety check to guard against containers that quickly exit after they are started.

But if your container is taking longer than 30 seconds to reach the running state then the readiness_delay won't kick in.

Instead could you try setting deploy_timeout to a higher value? That will increase how long we wait for it to boot.

djmb avatar Nov 04 '24 11:11 djmb

thanks! will try and report back!

34code avatar Nov 06 '24 08:11 34code

doesn't seem to be healthy.. i.e. somehow healthcheck is still running? even with proxy: false set and a long deploy_timeout: 120 set..

Here are the last few lines of the "target failed to become healthy" logs in case it helps:

ERROR Failed to boot web on 192.168.5.38
  INFO First web container is unhealthy on 192.168.5.38, not booting any other roles
  INFO [395476f8] Running docker container ls --all --filter name=^botflip-worker-web-f5a6fa4c715c9beeccfd33e65b9fdc4b6bf51706$ --quiet | xargs docker logs --timestamps 2>&1 on 192.168.5.38
  INFO [395476f8] Finished in 0.097 seconds with exit status 0 (successful).
 ERROR 2024-11-06T23:05:58.835865744Z 23:05:58 jobs.1 | started with pid 10
2024-11-06T23:06:00.244645822Z 23:06:00 jobs.1 | /usr/local/bundle/ruby/3.3.0/gems/gtin-0.1.2/lib/gtin.rb:8: warning: already initialized constant GTIN::VERSION
2024-11-06T23:06:00.244689505Z 23:06:00 jobs.1 | /usr/local/bundle/ruby/3.3.0/gems/gtin-0.1.2/lib/gtin/version.rb:2: warning: previous definition of VERSION was here
2024-11-06T23:06:10.087653996Z 23:06:10 jobs.1 | SolidQueue-1.0.0 Fail claimed jobs (765.3ms)  job_ids: [], process_ids: []
2024-11-06T23:06:10.088657366Z 23:06:10 jobs.1 | SolidQueue-1.0.0 Started Supervisor (6188.6ms)  pid: 10, hostname: "192.168.5.38-b3a74b7b4fd6", process_id: 91, name: "supervisor-feadb996a02df2dad530"
2024-11-06T23:06:13.606476546Z 23:06:13 jobs.1 | SolidQueue-1.0.0 Started Worker (3479.9ms)  pid: 48, hostname: "192.168.5.38-b3a74b7b4fd6", process_id: 93, name: "worker-30d0d56fd605403893aa", polling_interval: 0.1, queues: "*", thread_pool_size: 2
2024-11-06T23:06:13.606558788Z 23:06:13 jobs.1 | SolidQueue-1.0.0 Started Worker (3497.7ms)  pid: 36, hostname: "192.168.5.38-b3a74b7b4fd6", process_id: 94, name: "worker-62cfa59b665dd653f5a1", polling_interval: 0.1, queues: "*", thread_pool_size: 2
2024-11-06T23:06:13.606657085Z 23:06:13 jobs.1 | SolidQueue-1.0.0 Started Dispatcher (3504.5ms)  pid: 32, hostname: "192.168.5.38-b3a74b7b4fd6", process_id: 92, name: "dispatcher-fd238c157d5840a81cff", polling_interval: 0.1, batch_size: 5000, concurrency_maintenance_interval: 600
2024-11-06T23:06:13.607040809Z 23:06:13 jobs.1 | SolidQueue-1.0.0 Started Worker (3489.6ms)  pid: 42, hostname: "192.168.5.38-b3a74b7b4fd6", process_id: 96, name: "worker-52ee0a4d36d8bb75614a", polling_interval: 0.1, queues: "*", thread_pool_size: 2
2024-11-06T23:06:13.607116874Z 23:06:13 jobs.1 | SolidQueue-1.0.0 Started Worker (3472.3ms)  pid: 54, hostname: "192.168.5.38-b3a74b7b4fd6", process_id: 95, name: "worker-75bb1f2941a093c0eb3d", polling_interval: 0.1, queues: "*", thread_pool_size: 2
2024-11-06T23:06:13.607778097Z 23:06:13 jobs.1 | SolidQueue-1.0.0 Started Worker (3456.5ms)  pid: 66, hostname: "192.168.5.38-b3a74b7b4fd6", process_id: 97, name: "worker-e2505e10fd8b85386f4d", polling_interval: 0.1, queues: "*", thread_pool_size: 2
2024-11-06T23:06:13.610317623Z 23:06:13 jobs.1 | SolidQueue-1.0.0 Started Worker (3442.9ms)  pid: 78, hostname: "192.168.5.38-b3a74b7b4fd6", process_id: 98, name: "worker-419d1ad8d68bdcea43ce", polling_interval: 0.1, queues: "*", thread_pool_size: 2
2024-11-06T23:06:13.620247608Z 23:06:13 jobs.1 | SolidQueue-1.0.0 Started Worker (3476.6ms)  pid: 60, hostname: "192.168.5.38-b3a74b7b4fd6", process_id: 99, name: "worker-04923b2b42c7a3f3805e", polling_interval: 0.1, queues: "*", thread_pool_size: 2
2024-11-06T23:06:13.627670308Z 23:06:13 jobs.1 | SolidQueue-1.0.0 Started Worker (3468.3ms)  pid: 72, hostname: "192.168.5.38-b3a74b7b4fd6", process_id: 100, name: "worker-d7669a3b5be1ae95abaf", polling_interval: 0.1, queues: "*", thread_pool_size: 2
  INFO [8e9eecca] Running docker container ls --all --filter name=^botflip-worker-web-f5a6fa4c715c9beeccfd33e65b9fdc4b6bf51706$ --quiet | xargs docker inspect --format '{{json .State.Health}}' on 192.168.5.38
  INFO [8e9eecca] Finished in 0.096 seconds with exit status 0 (successful).
 ERROR null
  INFO [82f1bd46] Running docker container ls --all --filter name=^botflip-worker-web-f5a6fa4c715c9beeccfd33e65b9fdc4b6bf51706$ --quiet | xargs docker stop on 192.168.5.38
  INFO [82f1bd46] Finished in 10.446 seconds with exit status 0 (successful).
Releasing the deploy lock...
  Finished all in 243.0 seconds
  ERROR (SSHKit::Command::Failed): Exception while executing on host 192.168.5.38: docker exit status: 1
docker stdout: Nothing written
docker stderr: Error: target failed to become healthy

34code avatar Nov 06 '24 23:11 34code

ah.. just noticed the "first web container is unhealthy".. perhaps this is related?

34code avatar Nov 06 '24 23:11 34code

I have a regular alpha version of my site deployed to the same host as so: The two deploy.ymls share the same few lines below..

# Deploy to these servers.
servers:
  web:
    hosts:
      - 192.168.5.38

34code avatar Nov 06 '24 23:11 34code

@34code - did you get to the bottom of this in the end?

djmb avatar Jan 10 '25 15:01 djmb

@djmb not yet... would like to be able to deploy my worker process (bin/jobs) without my app beside it for healthcheck with two entries in the Procfile.. I'll try again on a fresh machine once its provisioned to see if its a machine specific issue as this machine as a lot of rails projects deployed with kamal already on it (not sure if that could be causing problems).

34code avatar Jan 10 '25 23:01 34code

@djmb Still unable to make it work.. Here is the error I'm getting (even though bin/jobs starts successfully via my Procfile.jobs):

RROR (SSHKit::Command::Failed): Exception while executing on host 192.168.5.38: docker exit status: 1
docker stdout: Nothing written
docker stderr: Error: target failed to become healthy within configured timeout (2m0s)

And here is my config/deploy-worker.yml

# Name of your application. Used to uniquely configure containers.
service: botflip-worker

# Name of the container image.
image: 34code/botflip_worker

# Deploy to these servers.
servers:
  web:
    hosts:
      - 192.168.5.38
# Credentials for your image host.
registry:
  # Specify the registry server, if you're not using Docker Hub
  # server: registry.digitalocean.com / ghcr.io / ...
  server: container.registry.com/repo
  username: sambitb
  password: password
# host_path:container_path
volumes:
  - "/root/log:/rails/log"
# Configure builder setup.
builder:
  arch: amd64
  remote: ssh://[email protected]
  args:
    RUBY_VERSION: 3.4.1
  dockerfile: Dockerfile.worker
proxy: false
deploy_timeout: 120
logging:
  options:
    max-size: 365m

I think its related to "deploy_timeout" since nothing else is set at 2 mins...

34code avatar Jan 20 '25 22:01 34code

seems like the "target failed to become healthy" is a potential issue even with proxy: false set

34code avatar Jan 20 '25 22:01 34code

I tried removing the "deploy_timeout" line in deploy-worker.yml and it reverted to 30s timeout. but the error message still apppears:

Releasing the deploy lock...
  Finished all in 181.5 seconds
  ERROR (SSHKit::Command::Failed): Exception while executing on host 192.168.5.38: docker exit status: 1
docker stdout: Nothing written
docker stderr: Error: target failed to become healthy within configured timeout (30s)

And I think the container was shutdown by kamal.. as i see "sending SIGKILL to all processes" in the logs for the "bin/jobs" command inside the container which is no longer running

34code avatar Jan 20 '25 22:01 34code

Curious if 37s devs just run bin/jobs on metal without dockerizing first... dockerizing and deploying on kamal would be nice imo even so it technically works with a rails app in procfile besides bin/jobs.. not very cpu and mem efficient to have that dangling webapp when all you need is bin/jobs...

34code avatar Mar 19 '25 07:03 34code

Ah, I see the issue here. Setting proxy: false at the root level, doesn't do anything - you should set it under the web key instead. I've raised https://github.com/basecamp/kamal/pull/1509 to make a boolean at the root invalid to avoid this in future.

djmb avatar Apr 18 '25 13:04 djmb

thanks for the suggestion! adding proxy: false under the web key instead fixes my issue!!

34code avatar Apr 22 '25 00:04 34code

I configured two roles, web and worker, both using the same image. The web role deploys normally, but the worker role shows the error "container not ready after 30 seconds (unhealthy)" regardless of the configuration.

servers:
  web:
    hosts:
      - 192.168.1.100
    proxy: true
  worker:
    hosts:
      - 192.168.1.100
    proxy: false
    options:
      health-cmd: "sh -c 'date'"
    cmd: "bun run bun-app/bin/run-scheduled-jobs.ts"

shiny avatar Apr 24 '25 11:04 shiny

I just use a separate Dockerfile for my worker and a separate deploy-worker.yml with the web key and a separate Procfile. Hope this helps..

34code avatar Apr 25 '25 20:04 34code