fleet icon indicating copy to clipboard operation
fleet copied to clipboard

GitOps checkbox status does not stay after saving changes in load balanced environment

Open qa-wolf[bot] opened this issue 6 months ago • 6 comments

This is only occurring in a load balanced environment. Repro and fix validation will need to include load balanced environments (such as our load testing instances, or Dogfood).

Steps to reproduce:

  1. Log in
  2. Navigate to Settings
  3. Navigate to Integration tab
  4. Navigate to "Change management"
  5. Update the GitOps checkbox
  6. Click Save Expected: If the checkbox was originally checked, and we just unchecked it & saved, the current state should be unchecked. Actual: The state of the checkbox reverts back to being checked. However, after a refresh of the page, the state of the checkbox updates properly to be unchecked. Video: https://www.loom.com/share/c9f2bb7cfcc24ccfb53f5e44345c5cf8 (https://www.loom.com/share/c9f2bb7cfcc24ccfb53f5e44345c5cf8)

qa-wolf[bot] avatar Apr 30 '25 21:04 qa-wolf[bot]

@sharon-fdm I was able to reproduce this, appears to be unreleased

jmwatts avatar May 01 '25 19:05 jmwatts

Thanks @jmwatts. We will look into it.

sharon-fdm avatar May 01 '25 21:05 sharon-fdm

Investigated with @jmwatts, doesn't reproduce on local instances, Janis will elaborate on some ideas about what may be happening in load-balanced environments with multiple Fleet instances

jacobshandling avatar May 01 '25 21:05 jacobshandling

@sharon-fdm @jacobshandling and I took a look at this and it appears to be happening for hosted instances (QA Wolf Premium instance, Dogfood) and not local dev environments. I have a hunch it has to do with session storage (I'm not a developer so I don't know exactly how to explain it but I have seen situations like this before where the bug only happens in instances that have multiple webapps, loadbalanced). I checked the test run history and it looks like this is the first time the test ran so it may not be an unreleased bug (we just may not have noticed it prior to QA Wolf catching it in their test because we don't generally test these features in AWS hosted environments).

I'll be spinning up a load test environment early next week to test the migration from 4.66 to 4.68 and I can check then to see if:

  1. It's reproducible in another AWS hosted instance
  2. If it's pre-existing in 4.66.0, in which case it would be released.

Since the setting is saved and this is a UI bug only, I don't think it needs to hold up the release, but I should have more definite answers for you early next week.

jmwatts avatar May 01 '25 21:05 jmwatts

@jacobshandling @sharon-fdm I was able to reproduce this in a new load test environment that was spun up on 4.66.0. There are NO other users logged in, making changes in this server.

I believe this is related to session management as I have only been able to reproduce it in hosted load balanced environments. I also tried it on our Render QA instance and it did not happen - that environment is hosted but NOT load balanced.

This is a released bug. I've updated the labels.

jmwatts avatar May 05 '25 17:05 jmwatts

Timebox 2 point to see what needs to be done.

sharon-fdm avatar May 08 '25 17:05 sharon-fdm

@rachaelshaw says she has seen this for other settings as well, which is good. If it were only this setting and not others it would be even more elusive, but this means it's related to the general app config

jacobshandling avatar Jul 23 '25 19:07 jacobshandling

This is probably not a load-balancing thing as much as a database replication thing. If we're reading back data immediately after writing, we need to explicitly read from the primary db because the replicas may not have the data yet. We've got this going on in a few places already (search for instances of .RequirePrimary).

sgress454 avatar Jul 28 '25 13:07 sgress454

@juan-fdz-hawa I'm still seeing this bug in QA Wolf's environment and they are running the latest 4.74 build.

jmwatts avatar Sep 19 '25 20:09 jmwatts

This is actually not a trivial problem to fix completely - see this for context ... In a load balancer setup, there's no guarantee the client will land in the same server instance when refreshing the page, so getting a stale app config copy is always a possibility (due to cache mechanism in place).

juan-fdz-hawa avatar Sep 22 '25 21:09 juan-fdz-hawa

@jmwatts Fixing the issue related to refreshing the page right after clicking 'Save' won't be solvable ATM. The team has decided that is not worth fixing given the complexity of the problem.

juan-fdz-hawa avatar Sep 26 '25 13:09 juan-fdz-hawa

@iansltx 🐤

sgress454 avatar Oct 28 '25 21:10 sgress454

Chatted with @jkatz01 and he's got a local setup (that I don't have) that'll allow for more quickly validating this, so assigning to him for QA.

iansltx avatar Oct 28 '25 23:10 iansltx

This issue should not be fixed on load-balanced environments correct?

I was able to reproduce the bug on v4.73.0 on a load-balanced environment by adding replica lag, and after upgrading to rc-minor-fleet-v4.76.0 the bug was still happening. Also, this issue happens with more fields in the settings menu, such as Organization info -> Organization name.

Edit: I might have been checking for the wrong thing, will check again.

jkatz01 avatar Oct 29 '25 17:10 jkatz01

@rachaelshaw This seems like we need to put this back in drafting and take another crack at it in >= 4.78.

iansltx avatar Oct 29 '25 17:10 iansltx

Sorry, I'm incorrect here. We had some lack of clarity around the scope of the fix so we're probably still on track here, though in high replica lag situations we'll have other issues crop up that look similar to this one.

iansltx avatar Oct 29 '25 18:10 iansltx

Looks like this issue does not happen anymore on QAWolf's environment, and it's not happening in any version I tested with my local load-balanced, replica lag environment so it's probably good to go!

jkatz01 avatar Oct 29 '25 18:10 jkatz01

Checkbox in the cloud, Fix ensures no unchecked shroud, In Fleet, we are proud.

fleet-release avatar Nov 04 '25 22:11 fleet-release

QA Wolf confirmed this is now working for them. The issue was closed when they closed out the bug report on their side, however we have not released this fix version yet so I reopened it until we've released.

jmwatts avatar Nov 04 '25 22:11 jmwatts

In Fleet's glass city, Checkbox state now persists, Balance is maintained.

fleet-release avatar Nov 08 '25 00:11 fleet-release