kargo icon indicating copy to clipboard operation
kargo copied to clipboard

Stage's `.status.phase` is forever in `Promoting` with no running Promotions

Open jessesuen opened this issue 1 year ago • 2 comments

Description

I noticed in one instance, a bunch of Stages were in stuck Promoting state.

$ k get stages
NAME           SHARD   CURRENT FREIGHT                            HEALTH      PHASE           AGE
prod                                                                          NotApplicable   34h
prod-central           a1e76d1acaff48174da7b3abb938d57c7f07af85   Unhealthy   NotApplicable   34h
prod-west              a1e76d1acaff48174da7b3abb938d57c7f07af85   Unhealthy   NotApplicable   34h
prod-east              a1e76d1acaff48174da7b3abb938d57c7f07af85   Unhealthy   Steady          34h
ab-test-a              f40255e4e3959d5c713d0454f8df22b6aa072008   Healthy     Promoting       34h
ab-test-b              ade77c672f509413e774de167f2caf5319e427c3   Healthy     Promoting       34h
staging                4e0c9f8d4c0d7f8cbed96b17dbb4bee01aa60511   Healthy     Promoting       34h
dev                    778a4b2cd6bcbde5da6d9eb8cb242ce6941c2cb4   Healthy     Promoting       34h

This is despite not having any running promotions.

$ k get promotions | grep Running
$

Steps to Reproduce

Version

v0.5.0

Logs

Paste any relevant application logs here.

jessesuen avatar Apr 05 '24 05:04 jessesuen

Given the lack of logs, do you have any gut feeling on how this could potentially be reproduced, or if the last Promotion for the stuck Stages resulted in e.g. an error? As it almost appears like the Stage reconciler never kicks off again.

hiddeco avatar Apr 05 '24 09:04 hiddeco

@jessesuen did you expect the shard column to be blank for all of those?

The v0.4.0 --> v0.5.0 upgrade logic should have accounted for copying the value of the shard label to the new shard field.

If you're in a sharded topology and something has gone wrong with that process, it is possible that all those stages are no longer being reconciled, which would explain why they all appear to be stuck.

krancour avatar Apr 05 '24 15:04 krancour

This issue is quite old and things have changed a lot since it was opened. I can only assume this behavior is not still being observed. But @jessesuen please re-open if you know this to still be an issue.

krancour avatar Aug 19 '24 16:08 krancour