opentelemetry-demo icon indicating copy to clipboard operation
opentelemetry-demo copied to clipboard

Demo environment generates errors by default

Open flands opened this issue 1 year ago • 9 comments

Bug Report

Which version of the demo you are using? 1.10.0

Symptom

If you start the demo environment from scratch, errors are reported for the adservice.

What is the expected behavior?

Either:

  1. The demo environment doesn't generate errors by default - currently how the documentation reads: https://opentelemetry.io/docs/demo/#scenarios
  2. The demo environment does generate errors by default but these errors are documented and thus expected.

What is the actual behavior?

The adservice generates errors by default yet the documentation seems to indicate you must enable scenarios to generate errors and other problems.

Reproduce

Provide the minimum required steps to result in the issue you're observing.

We will close this issue if:

  • The steps you provided are complex.
  • If we can not reproduce the behavior you're reporting.

Additional Context

Logs messages for adservice will show: ad-service | 2024-06-23 15:11:51 - oteldemo.AdService - GetAds Failed with status Status{code=UNAVAILABLE, description=null, cause=null} trace_id=d963f87608e1ab611dee31ef9ac29860 span_id=84ce83545d6852bb trace_flags=01

src/flagd/demo.flagd.json shows:

    "adServiceFailure": {
      "description": "Fail ad service",
      "state": "ENABLED",
      "variants": {
        "on": true,
        "off": false
      },
      "defaultVariant": "off",
      "targeting": {
        "fractional": [
          ["on", 10],
          ["off", 90]
        ]
      }
    },

The problem is off which should be set to 100 by default

flands avatar Jun 23 '24 19:06 flands

Cart service also needs to be updated. We should do them both in the same PR that follows a format noted in this comment

puckpuck avatar Jun 27 '24 21:06 puckpuck

Actually I've tried the solution mentioned by @beeme1mr and everything broke.

recommendation-service   | grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
recommendation-service   | 	status = StatusCode.NOT_FOUND
recommendation-service   | 	details = "FlagdError:, FLAG_DISABLED"
recommendation-service   | 	debug_error_string = "UNKNOWN:Error received from peer ipv4:172.20.0.5:8013 {grpc_message:"FlagdError:, FLAG_DISABLED", grpc_status:5, created_time:"2024-06-29T03:51:25.86731025+00:00"}"
product-catalog-service  | 2024/06/29 03:51:25 openfeature: FLAG_NOT_FOUND: not_found: FlagdError:, FLAG_DISABLED

It seems that when disabling the Feature Flag, it doesn't return false, as expected.

julianocosta89 avatar Jun 29 '24 03:06 julianocosta89

Hey @flands and @julianocosta89, I'll look into this tomorrow. When the flag is disabled, the SDK uses the default value defined in the code. The message @julianocosta89 posted is likely just an overly verbose log message.

beeme1mr avatar Jun 30 '24 21:06 beeme1mr

@beeme1mr I think some services do not default to false 😞

julianocosta89 avatar Jul 01 '24 00:07 julianocosta89

@julianocosta89 I looked into this a bit and there are a few takeaways:

  1. For some reason in my environment (macos) changing the demo.flagd.json is not triggering changes to the flag definitions in flagd. Restarting the flagd service reflects the changes. This may be only affecting macos.
  2. If a flag is disabled, the python SDK is very verbose in its logs. The recommendationServiceCacheFailure flag does appear to be correctly falling back to its False default value. The logs are happening within the openfeature SDK.

I talked to @beeme1mr and he is looking into the verbose logging in the SDK. He agrees this situation isn't ideal and maybe shouldn't be logged the same way as other "real" failures. He's also going to look into why the flag file changes aren't being picked up by flagd in the demo setup.

dyladan avatar Jul 01 '24 20:07 dyladan

Thanks for taking a look at it @dyladan! Interesting enough when I update my feature flags it works fine, without having to restart the service. I'm running on the demo on macOS M1

julianocosta89 avatar Jul 02 '24 14:07 julianocosta89

@julianocosta89 I was able to track the flagd reload issue down to my specific setup. Apparently in colima (the container runtime I'm using) the WRITE event is not triggered when I write a mounted file. You can probably ignore it for now.

dyladan avatar Jul 03 '24 12:07 dyladan

I also have a quick update. We currently treat disabled flags like missing flags. That means we'll fall back to whatever is defined in the code, but we're also noisy about it because it's assumed you're accidentally using a flag. Obviously, that's not the ideal experience here, and we're working on a solution. It may take a few days to fully implement, but we're actively working on it and will provide an update ASAP.

beeme1mr avatar Jul 03 '24 12:07 beeme1mr

Thanks for the updates @dyladan and @beeme1mr!

julianocosta89 avatar Jul 03 '24 14:07 julianocosta89

Closing this since the issues mentioned about flagd have been fixed, and the default flag values are all set properly.

puckpuck avatar Feb 17 '25 01:02 puckpuck