dashboard Highlight problematic/orphan pods

This is a feature request, a bit on the advanced side. I know it's not going to make it into 1.1. Feel free to defer it or close it entirely.

Yesterday I had to clean up a cluster, as there were pods still around, without any of the replica sets that used to control them. I think it might have been a result of https://github.com/kubernetes/kubernetes/issues/23252 or something similar.

It was painful to do manually, because replicaset suffixes like foo-4245793053 are not predictable, ordered or human-friendly in general. For each deployment, I had to figure the active replicaset foo-X, then delete pods not matching it as a prefix: foo-Y-12345, foo-Y-er42n, 'foo-A-y9v37z`, etc.

It would be nice to highlight such 'orphan' pods that don't belong to a RS, RC or DS. Sometimes they exist for deliberate reasons, so perhaps a scary colour like red is not appropriate. It just has to look a little different from regular pods. There was another class of pods worth highlighting that I had thought of, but it escapes me now.

Jun 10 '16 20:06 therc

This is a feature request, a bit on the advanced side. I know it's not going to make it into 1.1. Feel free to defer it or close it entirely.

Please report such feature request whenever you can, it has huge value to us.

It would be nice to highlight such 'orphan' pods that don't belong to a RS, RC or DS. Sometimes they exist for deliberate reasons, so perhaps a scary colour like red is not appropriate. It just has to look a little different from regular pods. There was another class of pods worth highlighting that I had thought of, but it escapes me now.

At some point we were planning to by default show only 'orphan' pods in pod list views. This is because pods generally belong to controllers (RS, RC, etc), so showing them in a separate list is messy. This may span further; to Replica Sets. I.e., by default list only orphan RSes. With that we'll show every 'application' only once on workload pages (now we can show it thrice when somebody uses Deployments).

Anyway, this is a good feature request and we should clarify it out in following weeks.

cc @kubernetes/deployment @janetkuo

Jun 13 '16 06:06 bryk

Actually, I just found that nowadays you just can't inspect an orphan pod:

2016-12-28T01:59:37.101523039Z [2016-12-28T01:59:37Z] Incoming HTTP/1.1 GET /api/v1/pod/kube-system/kube-dns-v20-yg7e5 request from X
2016-12-28T01:59:37.101558733Z Getting details of kube-dns-v20-yg7e5 pod in kube-system namespace
2016-12-28T01:59:37.101951825Z Getting pod metrics
2016-12-28T01:59:37.328409635Z replicationcontrollers "kube-dns-v20" not found
2016-12-28T01:59:37.328640870Z [2016-12-28T01:59:37Z] Outcoming response to X with 500 status code

I had deleted the RC so that it wouldn't recreate new pods as I change labels on existing ones before removing them (being super paranoid about disrupting DNS).

Dec 28 '16 02:12 therc

Orphaned pods don't have owner references.

Apr 03 '17 11:04 0xmichalis

@kargakis Last time I've checked owner references weren't implemented for all resources. Can you confirm it's done right now?

Apr 03 '17 12:04 maciaszczykm

It is for all workload controllers in 1.6. Still missing from Jobs/CronJobs IIRC

Apr 03 '17 12:04 0xmichalis

Quoting from https://github.com/kubernetes/dashboard/issues/1827#issuecomment-311937536:

As highlighting orpahned pods should not be hard to add then we can think how to show that in dashboard. We have to remember that it's possible to just create pod without any controller so it should not be done to scare users but rather to inform them that some pods do not have any controllers managing them.

As a simple solution we could probably just add new column on pod list and display Orphaned true/false. Also it should not be hard to add support for sorting by this column. This way user could easily sort and check which pods are orphaned.

@therc would this solution work for you?

Aug 17 '17 11:08 maciaszczykm

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta. /lifecycle stale

Jan 03 '18 05:01 fejta-bot

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle rotten /remove-lifecycle stale

Feb 08 '18 04:02 fejta-bot

dashboard dashboard copied to clipboard

Highlight problematic/orphan pods

dashboard
dashboard copied to clipboard