fleet GitOps checkbox status does not stay after saving changes in load balanced environment

This is only occurring in a load balanced environment. Repro and fix validation will need to include load balanced environments (such as our load testing instances, or Dogfood).

Steps to reproduce:

Log in
Navigate to Settings
Navigate to Integration tab
Navigate to "Change management"
Update the GitOps checkbox
Click Save Expected: If the checkbox was originally checked, and we just unchecked it & saved, the current state should be unchecked. Actual: The state of the checkbox reverts back to being checked. However, after a refresh of the page, the state of the checkbox updates properly to be unchecked. Video: https://www.loom.com/share/c9f2bb7cfcc24ccfb53f5e44345c5cf8 (https://www.loom.com/share/c9f2bb7cfcc24ccfb53f5e44345c5cf8)

Apr 30 '25 21:04 qa-wolf[bot]

@sharon-fdm I was able to reproduce this, appears to be unreleased

May 01 '25 19:05 jmwatts

Thanks @jmwatts. We will look into it.

May 01 '25 21:05 sharon-fdm

Investigated with @jmwatts, doesn't reproduce on local instances, Janis will elaborate on some ideas about what may be happening in load-balanced environments with multiple Fleet instances

May 01 '25 21:05 jacobshandling

@sharon-fdm @jacobshandling and I took a look at this and it appears to be happening for hosted instances (QA Wolf Premium instance, Dogfood) and not local dev environments. I have a hunch it has to do with session storage (I'm not a developer so I don't know exactly how to explain it but I have seen situations like this before where the bug only happens in instances that have multiple webapps, loadbalanced). I checked the test run history and it looks like this is the first time the test ran so it may not be an unreleased bug (we just may not have noticed it prior to QA Wolf catching it in their test because we don't generally test these features in AWS hosted environments).

I'll be spinning up a load test environment early next week to test the migration from 4.66 to 4.68 and I can check then to see if:

It's reproducible in another AWS hosted instance
If it's pre-existing in 4.66.0, in which case it would be released.

Since the setting is saved and this is a UI bug only, I don't think it needs to hold up the release, but I should have more definite answers for you early next week.

May 01 '25 21:05 jmwatts

@jacobshandling @sharon-fdm I was able to reproduce this in a new load test environment that was spun up on 4.66.0. There are NO other users logged in, making changes in this server.

I believe this is related to session management as I have only been able to reproduce it in hosted load balanced environments. I also tried it on our Render QA instance and it did not happen - that environment is hosted but NOT load balanced.

This is a released bug. I've updated the labels.

May 05 '25 17:05 jmwatts

Timebox 2 point to see what needs to be done.

May 08 '25 17:05 sharon-fdm

@rachaelshaw says she has seen this for other settings as well, which is good. If it were only this setting and not others it would be even more elusive, but this means it's related to the general app config

Jul 23 '25 19:07 jacobshandling

This is probably not a load-balancing thing as much as a database replication thing. If we're reading back data immediately after writing, we need to explicitly read from the primary db because the replicas may not have the data yet. We've got this going on in a few places already (search for instances of .RequirePrimary).

Jul 28 '25 13:07 sgress454

@juan-fdz-hawa I'm still seeing this bug in QA Wolf's environment and they are running the latest 4.74 build.

Sep 19 '25 20:09 jmwatts

This is actually not a trivial problem to fix completely - see this for context ... In a load balancer setup, there's no guarantee the client will land in the same server instance when refreshing the page, so getting a stale app config copy is always a possibility (due to cache mechanism in place).

Sep 22 '25 21:09 juan-fdz-hawa

@jmwatts Fixing the issue related to refreshing the page right after clicking 'Save' won't be solvable ATM. The team has decided that is not worth fixing given the complexity of the problem.

Sep 26 '25 13:09 juan-fdz-hawa

@iansltx 🐤

Oct 28 '25 21:10 sgress454

Chatted with @jkatz01 and he's got a local setup (that I don't have) that'll allow for more quickly validating this, so assigning to him for QA.

Oct 28 '25 23:10 iansltx

This issue should not be fixed on load-balanced environments correct?

I was able to reproduce the bug on v4.73.0 on a load-balanced environment by adding replica lag, and after upgrading to rc-minor-fleet-v4.76.0 the bug was still happening. Also, this issue happens with more fields in the settings menu, such as Organization info -> Organization name.

Edit: I might have been checking for the wrong thing, will check again.

Oct 29 '25 17:10 jkatz01

@rachaelshaw This seems like we need to put this back in drafting and take another crack at it in >= 4.78.

Oct 29 '25 17:10 iansltx

Sorry, I'm incorrect here. We had some lack of clarity around the scope of the fix so we're probably still on track here, though in high replica lag situations we'll have other issues crop up that look similar to this one.

Oct 29 '25 18:10 iansltx

Looks like this issue does not happen anymore on QAWolf's environment, and it's not happening in any version I tested with my local load-balanced, replica lag environment so it's probably good to go!

Oct 29 '25 18:10 jkatz01

Checkbox in the cloud, Fix ensures no unchecked shroud, In Fleet, we are proud.

Nov 04 '25 22:11 fleet-release

QA Wolf confirmed this is now working for them. The issue was closed when they closed out the bug report on their side, however we have not released this fix version yet so I reopened it until we've released.

Nov 04 '25 22:11 jmwatts

In Fleet's glass city, Checkbox state now persists, Balance is maintained.

Nov 08 '25 00:11 fleet-release

fleet fleet copied to clipboard

GitOps checkbox status does not stay after saving changes in load balanced environment

fleet
fleet copied to clipboard