dashboard icon indicating copy to clipboard operation
dashboard copied to clipboard

Filter/Sort for cluster with persisting issues

Open dkistner opened this issue 3 years ago • 15 comments

What would you like to be added: As an operator I want to have an option to filter and/or sort the All Projects view for Shoot clusters with persisting issues.

  • An option to filter the list for issues that persist longer that 5m, 10m, 1h (or even configurable timeframe) would be preferable.
  • An option to sort the list for oldest cluster with issue to newest cluster with issues would be helpful (considering the Shoot status and the Shoot conditions).
  • Furthermore it would be great if the update of the list can be disabled for a certain point in time.

Why is this needed: As Gardener comes with self healing capabilities therefore many issues resolve on its own after some time e.g. with the next retried Shoot operation. As an operator I might want to give the system a chance to self heal before digging into the issue.

/cc @ScheererJ, @mliepold, @dguendisch

dkistner avatar Oct 25 '21 13:10 dkistner

You can actually already sort on the Readiness column, this partially gives you what you want, however the current Readiness column sorting is a bit cumbersome because when I want to have the oldest issues first, then it'll still sort in progressing chiplets before that...

dguendisch avatar Oct 25 '21 13:10 dguendisch

Yes, you are right, but the then I'm just looking onto the conditions. I would prefer to have view which combine both Status and Readiness for sorting the list.

I also added the point that I want to pause update of the entire list temporarily.

dkistner avatar Oct 25 '21 14:10 dkistner

Hi @dkistner Can you give some more background information. Can you explain the exact use case why you want to pause the update. We would like to know how you work with this list so that we can implement something useful.

grolu avatar Jun 01 '22 13:06 grolu

Hi @grolu,

at the time there were resource shortages at some infrastructure providers. This caused a huge number of clusters per landscape to have issues as the seeds did not have enough nodes to handle all control planes. However, the number of clusters with issues was not static as clusters were reconciled and sometimes new issues occurred/old disappeared. The result was that it was not possible to properly cycle through the paged list of clusters with issues as the list was ever changing. This got to the point when trying to hover over an issue to get the mouse over text did not work as the clusters "jumped" in the table. It would have been nice to disable the update of the table for as long as I was going through the table.

ScheererJ avatar Jun 01 '22 13:06 ScheererJ

We thought about freezing the sorting. However clusters that are removed from the list or added would still cause the list to be unsteady. I do not think that pausing the update (working on snapshot data) is a good idea, as this would mean that you work on outdated information. Maybe we could keep placeholders (greyed out) for removed clusters and add new ones to the end of the list or something like that. However, even in this case, we would need to freeze the sorting / or the data update as the list would change in case you sort by last issue date (which is the preferred sorting for this list I guess). @holgerkoser @petersutter

grolu avatar Jun 01 '22 14:06 grolu

Would something like this fulfill your requirements?

NOTE: This is only a PoC we may not be able to make this productive, but I want to get your feedback early.

Screen Shot 2022-06-02 at 16 55 44

There is no filtering option for persisting issues yet, but maybe having the issue since column would be sufficient...

grolu avatar Jun 02 '22 15:06 grolu

I like this freeze idea!! However I'd consider filtering pretty important too (sometimes we focus on one cloudprovider only or exclude a bunch of well known clusters that all have a common prefix...). Freeze without filter would still be nice but only half as nice :)

dguendisch avatar Jun 02 '22 15:06 dguendisch

Screenshot 2022-06-02 at 17 17 31 and using the search to filter is not sufficient?

petersutter avatar Jun 02 '22 15:06 petersutter

Ah, mixed it up, so search is available but filtering out user/ticket/configuration issues is not? But how should that work? I think every operator has most of the time these filters ticked, if you now hit "Freeze", this would be "unticked"? 🤔

dguendisch avatar Jun 02 '22 15:06 dguendisch

There is a misunderstanding I just wanted to point out that I did not add a dedicated "persistent issues" filter

An option to filter the list for issues that persist longer that 5m, 10m, 1h (or even configurable timeframe) would be preferable.

grolu avatar Jun 02 '22 15:06 grolu

The freeze list option seems like what I wanted :-) . My understanding is that I could filter the list with the existing checkboxes and/or the search and then hit the freeze list toggle to keep it stable while I page through the list. Is that assumption correct?

ScheererJ avatar Jun 02 '22 15:06 ScheererJ

correct

petersutter avatar Jun 02 '22 15:06 petersutter

I see, then everything is fine, looks perfect!

dguendisch avatar Jun 02 '22 15:06 dguendisch

What about new items, clusters that turn into error state after you switched on the freeze option. Would you expect them to be added to the end of the list or is it ok like it is now (not add them to the list until you "unfreeze")

grolu avatar Jun 02 '22 15:06 grolu

From my perspective, freezing the list is a temporary thing. Therefore, not adding items works for me. I want to check the status quo. After I finished checking whatever I wanted to check, I would "unfreeze" the list and then expect to see all items including new ones.

TLDR: For me, it would be ok.

ScheererJ avatar Jun 02 '22 15:06 ScheererJ