fleet
fleet copied to clipboard
GitOps checkbox status does not stay after saving changes in load balanced environment
This is only occurring in a load balanced environment. Repro and fix validation will need to include load balanced environments (such as our load testing instances, or Dogfood).
Steps to reproduce:
- Log in
- Navigate to Settings
- Navigate to Integration tab
- Navigate to "Change management"
- Update the GitOps checkbox
- Click Save Expected: If the checkbox was originally checked, and we just unchecked it & saved, the current state should be unchecked. Actual: The state of the checkbox reverts back to being checked. However, after a refresh of the page, the state of the checkbox updates properly to be unchecked. Video: https://www.loom.com/share/c9f2bb7cfcc24ccfb53f5e44345c5cf8 (https://www.loom.com/share/c9f2bb7cfcc24ccfb53f5e44345c5cf8)
@sharon-fdm I was able to reproduce this, appears to be unreleased
Thanks @jmwatts. We will look into it.
Investigated with @jmwatts, doesn't reproduce on local instances, Janis will elaborate on some ideas about what may be happening in load-balanced environments with multiple Fleet instances
@sharon-fdm @jacobshandling and I took a look at this and it appears to be happening for hosted instances (QA Wolf Premium instance, Dogfood) and not local dev environments. I have a hunch it has to do with session storage (I'm not a developer so I don't know exactly how to explain it but I have seen situations like this before where the bug only happens in instances that have multiple webapps, loadbalanced). I checked the test run history and it looks like this is the first time the test ran so it may not be an unreleased bug (we just may not have noticed it prior to QA Wolf catching it in their test because we don't generally test these features in AWS hosted environments).
I'll be spinning up a load test environment early next week to test the migration from 4.66 to 4.68 and I can check then to see if:
- It's reproducible in another AWS hosted instance
- If it's pre-existing in 4.66.0, in which case it would be released.
Since the setting is saved and this is a UI bug only, I don't think it needs to hold up the release, but I should have more definite answers for you early next week.
@jacobshandling @sharon-fdm I was able to reproduce this in a new load test environment that was spun up on 4.66.0. There are NO other users logged in, making changes in this server.
I believe this is related to session management as I have only been able to reproduce it in hosted load balanced environments. I also tried it on our Render QA instance and it did not happen - that environment is hosted but NOT load balanced.
This is a released bug. I've updated the labels.
Timebox 2 point to see what needs to be done.
@rachaelshaw says she has seen this for other settings as well, which is good. If it were only this setting and not others it would be even more elusive, but this means it's related to the general app config
This is probably not a load-balancing thing as much as a database replication thing. If we're reading back data immediately after writing, we need to explicitly read from the primary db because the replicas may not have the data yet. We've got this going on in a few places already (search for instances of .RequirePrimary).
@juan-fdz-hawa I'm still seeing this bug in QA Wolf's environment and they are running the latest 4.74 build.
This is actually not a trivial problem to fix completely - see this for context ... In a load balancer setup, there's no guarantee the client will land in the same server instance when refreshing the page, so getting a stale app config copy is always a possibility (due to cache mechanism in place).
@jmwatts Fixing the issue related to refreshing the page right after clicking 'Save' won't be solvable ATM. The team has decided that is not worth fixing given the complexity of the problem.
@iansltx 🐤
Chatted with @jkatz01 and he's got a local setup (that I don't have) that'll allow for more quickly validating this, so assigning to him for QA.
This issue should not be fixed on load-balanced environments correct?
I was able to reproduce the bug on v4.73.0 on a load-balanced environment by adding replica lag, and after upgrading to rc-minor-fleet-v4.76.0 the bug was still happening. Also, this issue happens with more fields in the settings menu, such as Organization info -> Organization name.
Edit: I might have been checking for the wrong thing, will check again.
@rachaelshaw This seems like we need to put this back in drafting and take another crack at it in >= 4.78.
Sorry, I'm incorrect here. We had some lack of clarity around the scope of the fix so we're probably still on track here, though in high replica lag situations we'll have other issues crop up that look similar to this one.
Looks like this issue does not happen anymore on QAWolf's environment, and it's not happening in any version I tested with my local load-balanced, replica lag environment so it's probably good to go!
Checkbox in the cloud, Fix ensures no unchecked shroud, In Fleet, we are proud.
QA Wolf confirmed this is now working for them. The issue was closed when they closed out the bug report on their side, however we have not released this fix version yet so I reopened it until we've released.
In Fleet's glass city, Checkbox state now persists, Balance is maintained.