opentelemetry-demo
                                
                                 opentelemetry-demo copied to clipboard
                                
                                    opentelemetry-demo copied to clipboard
                            
                            
                            
                        Demo environment generates errors by default
Bug Report
Which version of the demo you are using? 1.10.0
Symptom
If you start the demo environment from scratch, errors are reported for the adservice.
What is the expected behavior?
Either:
- The demo environment doesn't generate errors by default - currently how the documentation reads: https://opentelemetry.io/docs/demo/#scenarios
- The demo environment does generate errors by default but these errors are documented and thus expected.
What is the actual behavior?
The adservice generates errors by default yet the documentation seems to indicate you must enable scenarios to generate errors and other problems.
Reproduce
Provide the minimum required steps to result in the issue you're observing.
We will close this issue if:
- The steps you provided are complex.
- If we can not reproduce the behavior you're reporting.
Additional Context
Logs messages for adservice will show: ad-service               | 2024-06-23 15:11:51 - oteldemo.AdService - GetAds Failed with status Status{code=UNAVAILABLE, description=null, cause=null} trace_id=d963f87608e1ab611dee31ef9ac29860 span_id=84ce83545d6852bb trace_flags=01
src/flagd/demo.flagd.json shows:
    "adServiceFailure": {
      "description": "Fail ad service",
      "state": "ENABLED",
      "variants": {
        "on": true,
        "off": false
      },
      "defaultVariant": "off",
      "targeting": {
        "fractional": [
          ["on", 10],
          ["off", 90]
        ]
      }
    },
The problem is off which should be set to 100 by default
Cart service also needs to be updated. We should do them both in the same PR that follows a format noted in this comment
Actually I've tried the solution mentioned by @beeme1mr and everything broke.
recommendation-service   | grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
recommendation-service   | 	status = StatusCode.NOT_FOUND
recommendation-service   | 	details = "FlagdError:, FLAG_DISABLED"
recommendation-service   | 	debug_error_string = "UNKNOWN:Error received from peer ipv4:172.20.0.5:8013 {grpc_message:"FlagdError:, FLAG_DISABLED", grpc_status:5, created_time:"2024-06-29T03:51:25.86731025+00:00"}"
product-catalog-service  | 2024/06/29 03:51:25 openfeature: FLAG_NOT_FOUND: not_found: FlagdError:, FLAG_DISABLED
It seems that when disabling the Feature Flag, it doesn't return false, as expected.
Hey @flands and @julianocosta89, I'll look into this tomorrow. When the flag is disabled, the SDK uses the default value defined in the code. The message @julianocosta89 posted is likely just an overly verbose log message.
@beeme1mr I think some services do not default to false 😞
@julianocosta89 I looked into this a bit and there are a few takeaways:
- For some reason in my environment (macos) changing the demo.flagd.jsonis not triggering changes to the flag definitions inflagd. Restarting theflagdservice reflects the changes. This may be only affecting macos.
- If a flag is disabled, the python SDK is very verbose in its logs. The recommendationServiceCacheFailureflag does appear to be correctly falling back to itsFalsedefault value. The logs are happening within theopenfeatureSDK.
I talked to @beeme1mr and he is looking into the verbose logging in the SDK. He agrees this situation isn't ideal and maybe shouldn't be logged the same way as other "real" failures. He's also going to look into why the flag file changes aren't being picked up by flagd in the demo setup.
Thanks for taking a look at it @dyladan! Interesting enough when I update my feature flags it works fine, without having to restart the service. I'm running on the demo on macOS M1
@julianocosta89 I was able to track the flagd reload issue down to my specific setup. Apparently in colima (the container runtime I'm using) the WRITE event is not triggered when I write a mounted file. You can probably ignore it for now.
I also have a quick update. We currently treat disabled flags like missing flags. That means we'll fall back to whatever is defined in the code, but we're also noisy about it because it's assumed you're accidentally using a flag. Obviously, that's not the ideal experience here, and we're working on a solution. It may take a few days to fully implement, but we're actively working on it and will provide an update ASAP.
Thanks for the updates @dyladan and @beeme1mr!
Closing this since the issues mentioned about flagd have been fixed, and the default flag values are all set properly.