netbox-chart
netbox-chart copied to clipboard
housekeeping stuck in ContainerCreating mode
When housekeeping and persistence are enabled pods with volumes should be started on the same nodes, at least on GCP, as GCP can mount any from default storage classes only to one node at the same time. By looking into CronJob template i see that it tries to mount those volumes, but then again, reading https://demo.netbox.dev/static/docs/administration/housekeeping/ documentation housekeeping needs only database access.
Another possible solution would be to write podAffinity so that cronjob is started on node together with netbox application
We could probably remove some of the volume mounts from the housekeeping job, though I can see how it might be needed in future.
I wouldn't want to second guess about the affinities. You might be running with ReadWriteMany
volumes (e.g. NFS, EFS, or similar) which are fine to be mounted across multiple nodes. If you're using a ReadWriteOnce
volume then it's up to you to supply the correct affinity settings to allow that to work.
I'd happily accept PRs to make the volume mounting in the housekeeping job optional, and also to improve the documentation about this problem and add example affinity settings.
Sure, i will start working on this :)
Actually it is same situation with worker pod too.
by working on this, found also, that, if ReadWriteOnce is used, then housekeeping should be disabled, cause CronJob does not have PodAffinity definition
There's a housekeeping.affinity
setting that does what you want I think?
CronJob does not support affinity as per spec: https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/cron-job-v1/
CronJob doesn't need to support it, it's not its job. All a CronJob does is create Jobs from the jobTemplate
based on the schedule
. Similarly, Jobs don't do much more than create Pods based on their template
field. That template, though, is a full Pod spec, and that's available in the spec.jobTemplate.spec.template.spec
field. I haven't tested it, but I don't see any reason the affinity
field we've already got in our housekeeping CronJob template wouldn't work.
ok, my bad, i saw error about that affinity when tried on cronjob, but that was cause of spacing wrong in definition.
@Atoms is right about the spacing in affinity definition. It should be nindent 12
instead of nindent 8
There 3 lines need to be fixed. Should I open a PR? https://github.com/bootc/netbox-chart/blob/master/templates/cronjob.yaml#L161 https://github.com/bootc/netbox-chart/blob/master/templates/cronjob.yaml#L165 https://github.com/bootc/netbox-chart/blob/master/templates/cronjob.yaml#L169
Oh d'oh. Yes please, a PR would be really handy, thanks.
i would like to revive this as it still exists... this is what i have:
admin@azure-box:~/Netbox/internal$ kubectl get pod -n netbox -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
netbox-64d9d994c5-g44rg 1/1 Running 0 2d22h 10.244.1.101 aks-nodepool1-27323697-vmss000000 <none> <none>
netbox-housekeeping-27450720-ktlgb 0/1 ContainerCreating 0 2d10h <none> aks-nodepool1-27323697-vmss00000c <none> <none>
netbox-postgresql-0 1/1 Running 0 2d22h 10.244.9.50 aks-nodepool1-27323697-vmss00000a <none> <none>
netbox-redis-master-0 1/1 Running 0 2d22h 10.244.0.29 aks-nodepool1-27323697-vmss00000e <none> <none>
netbox-redis-replicas-0 1/1 Running 0 2d22h 10.244.3.24 aks-nodepool1-27323697-vmss00000c <none> <none>
netbox-redis-replicas-1 1/1 Running 0 2d22h 10.244.2.79 aks-nodepool1-27323697-vmss000001 <none> <none>
netbox-redis-replicas-2 1/1 Running 0 2d22h 10.244.1.102 aks-nodepool1-27323697-vmss000000 <none> <none>
netbox-worker-5b74cfd4-rnxqk 1/1 Running 2 2d22h 10.244.1.100 aks-nodepool1-27323697-vmss000000 <none> <none>
and this is the describe
admin@azure-box:~/Netbox/internal$ kubectl describe pod netbox-housekeeping-27450720-ktlgb -n netbox
Name: netbox-housekeeping-27450720-ktlgb
Namespace: netbox
Priority: 0
Node: aks-nodepool1-27323697-vmss00000c/172.19.128.7
Start Time: Sat, 12 Mar 2022 00:00:00 +0000
Labels: app.kubernetes.io/component=housekeeping
app.kubernetes.io/instance=netbox
app.kubernetes.io/name=netbox
controller-uid=7f27bfeb-a813-4065-aa83-4921e34e0b23
job-name=netbox-housekeeping-27450720
Annotations: <none>
Status: Pending
IP:
IPs: <none>
Controlled By: Job/netbox-housekeeping-27450720
Containers:
netbox-housekeeping:
Container ID:
Image: netboxcommunity/netbox:v3.0.11
Image ID:
Port: <none>
Host Port: <none>
Command:
/opt/netbox/venv/bin/python
/opt/netbox/netbox/manage.py
housekeeping
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/etc/netbox/config/configuration.py from config (ro,path="configuration.py")
/opt/netbox/netbox/media from media (rw)
/run/config/netbox from config (ro)
/run/secrets/netbox from secrets (ro)
/tmp from netbox-tmp (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-chp5t (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: netbox
Optional: false
secrets:
Type: Projected (a volume that contains injected data from multiple sources)
SecretName: netbox
SecretOptionalName: <nil>
SecretName: netbox-postgresql
SecretOptionalName: <nil>
SecretName: netbox-redis
SecretOptionalName: <nil>
SecretName: netbox-redis
SecretOptionalName: <nil>
netbox-tmp:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium: Memory
SizeLimit: <unset>
media:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: netbox-media
ReadOnly: false
kube-api-access-chp5t:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedMount 58m (x248 over 2d10h) kubelet Unable to attach or mount volumes: unmounted volumes=[media], unattached volumes=[secrets netbox-tmp media kube-api-access-chp5t config]: timed out waiting for the condition
Warning FailedMount 22m (x483 over 2d10h) kubelet Unable to attach or mount volumes: unmounted volumes=[media], unattached volumes=[config secrets netbox-tmp media kube-api-access-chp5t]: timed out waiting for the condition
Warning FailedMount 17m (x260 over 2d10h) kubelet Unable to attach or mount volumes: unmounted volumes=[media], unattached volumes=[netbox-tmp media kube-api-access-chp5t config secrets]: timed out waiting for the condition
Warning FailedMount 12m (x268 over 2d10h) kubelet Unable to attach or mount volumes: unmounted volumes=[media], unattached volumes=[kube-api-access-chp5t config secrets netbox-tmp media]: timed out waiting for the condition
Warning FailedMount 3m46s (x277 over 2d10h) kubelet Unable to attach or mount volumes: unmounted volumes=[media], unattached volumes=[media kube-api-access-chp5t config secrets netbox-tmp]: timed out waiting for the condition
here are all the PVCs
admin@azure-box:~/Netbox/internal$ kubectl get pvc -n netbox
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
data-netbox-postgresql-0 Bound pvc-0cbe8766-8a6d-498d-a0d3-fb695c1cf212 8Gi RWO default 2d22h
netbox-media Bound pvc-63a58576-8cc3-415a-9152-96a886ecb611 30Gi RWO default 2d22h
redis-data-netbox-redis-master-0 Bound pvc-ab91656d-0d76-4c26-b24b-d40978830d74 8Gi RWO default 2d22h
redis-data-netbox-redis-replicas-0 Bound pvc-dcc5762e-a67b-4803-8dfe-df268865e8cd 8Gi RWO default 2d22h
redis-data-netbox-redis-replicas-1 Bound pvc-c6a23abf-1367-4f2f-bb96-d0150a1c53a7 8Gi RWO default 2d22h
redis-data-netbox-redis-replicas-2 Bound pvc-2f925079-a52b-49ff-86a5-80a41218ef92 8Gi RWO default 2d22h
additional information, if i added podaffinity under "netbox-housekeeping" , that cannot be used.
here is values.yml extract
affinity:
podAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 50
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app.kubernetes.io/component
operator: In
values:
- netbox
topologyKey: "kubernetes.io/hostname"
but that cannot be applied:
Error: UPGRADE FAILED: error validating "": error validating data: ValidationError(CronJob.spec.jobTemplate.spec.template): unknown field "podAffinity" in io.k8s.api.core.v1.PodTemplateSpec
I had issues with the netbox
and netbox-worker
pods, but adding these affinity rules to values.yaml
ensured they both started on the same node:
affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- topologyKey: kubernetes.io/hostname
labelSelector:
matchExpressions:
- key: app.kubernetes.io/component
operator: In
values:
- worker
- netbox
worker:
affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- topologyKey: kubernetes.io/hostname
labelSelector:
matchExpressions:
- key: app.kubernetes.io/component
operator: In
values:
- worker
- netbox
I had issues with the
netbox
andnetbox-worker
pods, but adding these affinity rules tovalues.yaml
ensured they both started on the same node:affinity: podAffinity: requiredDuringSchedulingIgnoredDuringExecution: - topologyKey: kubernetes.io/hostname labelSelector: matchExpressions: - key: app.kubernetes.io/component operator: In values: - worker - netbox worker: affinity: podAffinity: requiredDuringSchedulingIgnoredDuringExecution: - topologyKey: kubernetes.io/hostname labelSelector: matchExpressions: - key: app.kubernetes.io/component operator: In values: - worker - netbox
did you need netbox and the worker to be on the same node? here we were talking about netbox and the netbox-housekeeping
did you need netbox and the worker to be on the same node?
yes, as they share a common volume. I expect this will be the case with the netbox-housekeeping
pod too.
Looking at the code, the component label is housekeeping
: https://github.com/bootc/netbox-chart/blob/master/templates/cronjob.yaml#L16
so try this in your values.yaml
:
affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- topologyKey: kubernetes.io/hostname
labelSelector:
matchExpressions:
- key: app.kubernetes.io/component
operator: In
values:
- housekeeping
- netbox
housekeeping:
affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- topologyKey: kubernetes.io/hostname
labelSelector:
matchExpressions:
- key: app.kubernetes.io/component
operator: In
values:
- housekeeping
- netbox
PR #89 is a good start at a documentation update to "fix" this issue.