monocular icon indicating copy to clipboard operation
monocular copied to clipboard

Why monocular-api have so much restarts ?

Open cabrinoob opened this issue 6 years ago • 3 comments

Hi, I'am using monocular and everything works fine, but I realized that the API pods are restarting very often :

restarts

As you can see on the screenshot above, one of the pods retarted 61 times in 20 hours.

cabrinoob avatar May 16 '18 06:05 cabrinoob

This happens pretty commonly for large chart repositories (e.g. the stable repository), it takes a while for Monocular to index it and Kubernetes will try to kill it. You can try increasing the charts' livenessProbe delay to prevent this from happening: https://github.com/kubernetes-helm/monocular/blob/master/deployment/monocular/values.yaml#L25

prydonius avatar May 16 '18 14:05 prydonius

Why not response livenessProbe in time and in same time scrape chart from network?

fkpwolf avatar May 31 '18 08:05 fkpwolf

To make container response livenessProbe in time, I change foreground refresh to goroutine in main.go:

// Run foreground repository refresh
go chartsImplementation.Refresh()

So when pod are starting at first time, it will start livenessProbe REST service in time then will not be killed by k8s.

And I also find lots of OOM issue:

[147913.492743] Memory cgroup stats for /kubepods.slice/kubepods-podbee65b57_7e71_11e8_9ced_d24398b14524.slice/docker-5d3a4c6e75105b3ef0bdfd215ebec12e9a2d32341f369e75125954cb3eeed903.scope: cache:0KB rss:233428KB rss_huge:2048KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:233428KB inactive_file:0KB active_file:0KB unevictable:0KB
[147913.544835] [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
[147913.546739] [24389]     0 24389      253        1       4        0          -998 pause
[147913.548605] [24518]     0 24518    66805    60921     134        0          -998 monocular
[147913.550548] Memory cgroup out of memory: Kill process 26074 (monocular) score 54 or sacrifice child
[147913.552593] Killed process 24518 (monocular) total-vm:267220kB, anon-rss:232620kB, file-rss:11064kB, shmem-rss:0kB
[147919.170462] SELinux: mount invalid.  Same superblock, different security settings for (dev mqueue, type mqueue)

To fix this, increase Pods spec:

            "resources": {
              "limits": {
                "cpu": "100m",
                "memory": "928Mi"
              },
              "requests": {
                "cpu": "100m",
                "memory": "428Mi"
              }
            },

Still didn't know why it exhaust so many memory. Maybe we can decrease parallel download semaphore? @prydonius Currently it is 15.

fkpwolf avatar Jul 04 '18 09:07 fkpwolf