monocular
monocular copied to clipboard
Why monocular-api have so much restarts ?
Hi, I'am using monocular and everything works fine, but I realized that the API pods are restarting very often :
As you can see on the screenshot above, one of the pods retarted 61 times in 20 hours.
This happens pretty commonly for large chart repositories (e.g. the stable repository), it takes a while for Monocular to index it and Kubernetes will try to kill it. You can try increasing the charts' livenessProbe delay to prevent this from happening: https://github.com/kubernetes-helm/monocular/blob/master/deployment/monocular/values.yaml#L25
Why not response livenessProbe in time and in same time scrape chart from network?
To make container response livenessProbe in time, I change foreground refresh to goroutine in main.go:
// Run foreground repository refresh
go chartsImplementation.Refresh()
So when pod are starting at first time, it will start livenessProbe REST service in time then will not be killed by k8s.
And I also find lots of OOM issue:
[147913.492743] Memory cgroup stats for /kubepods.slice/kubepods-podbee65b57_7e71_11e8_9ced_d24398b14524.slice/docker-5d3a4c6e75105b3ef0bdfd215ebec12e9a2d32341f369e75125954cb3eeed903.scope: cache:0KB rss:233428KB rss_huge:2048KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:233428KB inactive_file:0KB active_file:0KB unevictable:0KB
[147913.544835] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
[147913.546739] [24389] 0 24389 253 1 4 0 -998 pause
[147913.548605] [24518] 0 24518 66805 60921 134 0 -998 monocular
[147913.550548] Memory cgroup out of memory: Kill process 26074 (monocular) score 54 or sacrifice child
[147913.552593] Killed process 24518 (monocular) total-vm:267220kB, anon-rss:232620kB, file-rss:11064kB, shmem-rss:0kB
[147919.170462] SELinux: mount invalid. Same superblock, different security settings for (dev mqueue, type mqueue)
To fix this, increase Pods spec:
"resources": {
"limits": {
"cpu": "100m",
"memory": "928Mi"
},
"requests": {
"cpu": "100m",
"memory": "428Mi"
}
},
Still didn't know why it exhaust so many memory. Maybe we can decrease parallel download semaphore? @prydonius Currently it is 15.