charts icon indicating copy to clipboard operation
charts copied to clipboard

[bitnami/discourse] Discourse forum - Sidekiq mem leak

Open crazyfree opened this issue 1 year ago • 7 comments

Name and Version

bitnami/discourse:3.2.4-debian-12-r0

What architecture are you using?

amd64

What steps will reproduce the bug?

  1. Go install discourse from the helm chart https://artifacthub.io/packages/helm/bitnami/discourse
  2. Setup daily backup to S3 with big database, mine contains 9Gb of data (60k users and ton of posts)
  3. Keep system running few days without access
  4. Check sidekiq RSS in path /sidekiq/busy
  5. You'll see that mem increasing constantly even no access to frontend at all. In my case, it crashed after 5 days when I set values.yaml like below:

Are you using any custom parameters or values?

resources:
  limits:
    memory: 4Gi
  requests:
    memory: 2Gi
discourse:
  skipInstall: true
  resources:
    limits:
      memory: 4Gi
    requests:
      memory: 2Gi

What is the expected behavior?

either the sidekiq mem will be a constant or at least have a method to do health check and restart automatically when the limit reached.

What do you see instead?

sidekiq mem increasing day by day until it consumed all the RAM after all ram was consumed, the sidekiq process doesn't get terminated, it hangs forever, so I cannot know how to do health check and restart it automatically.

Additional information

Screenshot 2024-07-19 at 13 17 55

as you see in the screenshot, the 1 day started instance memory keeps increasing until reached the limitation

crazyfree avatar Jul 19 '24 06:07 crazyfree

Hi,

Thank you so much for reporting. It seems to me that the issue is not in the Bitnami packaging of discourse (sidekiq in this case) but on the application itself. Did you check with the upstream sidekiq support?

javsalgar avatar Jul 19 '24 09:07 javsalgar

@javsalgar the setup for the sidekiq process in the bitnami discourse repo is totally different than in the discourse official repo. So it's quite hard for me to reach out upstream support.

crazyfree avatar Jul 19 '24 11:07 crazyfree

plus, In the effort of reducing sidekiq worker threads I found that the UNICORN_WORKERS, UNICORN_SIDEKIQS and UNICORN_SIDEKIQ_MAX_RSS env vars got no use in the bitnami/discourse image.

Can you show me the alternative way to reduce sidekiq worker threads @javsalgar

crazyfree avatar Jul 22 '24 08:07 crazyfree

Hi @crazyfree,

Those parameters are not supported in the Bitnami solution. However, we are open to contributions, would you like to improve the solution? You can follow the contributing guidelines in the Bitnami containers repo. The team will be more than happy to review the changes. You will need to add support for those env vars in the env file and the method that configures them in the conf file.

https://github.com/bitnami/containers/blob/main/bitnami/discourse/3/debian-12/rootfs/opt/bitnami/scripts/discourse-env.sh https://github.com/bitnami/containers/blob/main/bitnami/discourse/3/debian-12/rootfs/opt/bitnami/scripts/libdiscourse.sh#L245

As a workaround, you can also use the DISCOURSE_EXTRA_CONF_CONTENT env var that takes care of appending all the configuration lines you need to the conf file.

https://github.com/bitnami/containers/blob/main/bitnami/discourse/3/debian-12/rootfs/opt/bitnami/scripts/libdiscourse.sh#L276

Would that work for you?

jotamartos avatar Jul 30 '24 11:07 jotamartos

@jotamartos unfortunately, I'm on fire 🔥 🔥 🔥 so can't contribute to your repos now.

but, before we can patch that, here is the workaround. The idea is to find zombie process and let the control plane does the restart.

livenessProbe:
  exec:
    command:
    - /bin/sh
    - -c
    - |
      SIDEKIQ_PID=$(pgrep -f ^sidekiq)
      zombie_children=$(ps -eo pid,ppid,state | grep "$SIDEKIQ_PID Z" | wc -l)
      if [ "$zombie_children" -gt 0 ]; then
        echo "Sidekiq has a zombie child"
        exit 1
      fi
      exit 0

crazyfree avatar Aug 01 '24 04:08 crazyfree

FYI, I realized that the command above won't help, some times, sidekiq just stopped unreasonable, no zombie process spawned. So I was thinking of adding dump-init to sidekiq container but seems I have to modify sidekiq.command to

command:
   - /shared/dump-init
   - -- 
   - /bin/bash

which is default to ["/opt/bitnami/scripts/discourse/entrypoint.sh"]. Of course it's not success since the /opt/bitnami/scripts/discourse/entrypoint.sh has to be changed as well.

Things become more and more complicated 🥵

crazyfree avatar Aug 12 '24 07:08 crazyfree

The entrypoint is a script that performs some actions before running the "run.sh" script. You can mount your custom run.sh script in the pod/container and change the sidekiq.args parameter to execute your customized script.

## @section Sidekiq container parameters
sidekiq:
  ## @param sidekiq.command Custom command to override image cmd (evaluated as a template)
  ##
  command: ['/opt/bitnami/scripts/discourse/entrypoint.sh']
  ## @param sidekiq.args Custom args for the custom command (evaluated as a template)
  ##
  args: ['/opt/bitnami/scripts/discourse-sidekiq/run.sh']

Would that work for you?

jotamartos avatar Aug 16 '24 16:08 jotamartos

This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback.

github-actions[bot] avatar Sep 01 '24 01:09 github-actions[bot]

Due to the lack of activity in the last 5 days since it was marked as "stale", we proceed to close this Issue. Do not hesitate to reopen it later if necessary.

github-actions[bot] avatar Sep 07 '24 01:09 github-actions[bot]